Interoperability and Retrieval: 2.2 Interoperability

2.2 Interoperability

Open access resources are very important elements in creating information support system for creating global research and development infrastructure and availability of resources. Open access repositories, the green path of open-access, are playing a significant role in creating world-wide e-Research framework but the real value of repositories lies in their ability to be integrated with existing resources for providing a single-window search interface for end users. For example, OpenDOAR lists a total of 156 repositories on Physicsand these repositories are different in their coverage, software usage, nature of contents and most importantly in retrieval techniques and tools. To have access to these repositories what is needed is to develop a mechanism in the form of interoperability to facilitate search with single window search system. Interoperability means the ability of multiple systems (with different hardware and software platform, data structure, and user interface) to exchange data with minimal loss of content functionality. In bibliographic domain, interoperability is supported by Crosswalk. A crosswalk is a mapping of the elements, semantics and syntax from one metadata schema to those of another. It allows metadata created by one community to be used by another group that employs a different metadata standard. Interoperability and crosswalk ensures exchange of bibliographic data and contents amongst heterogeneous open contents systems across the globe. Open contents retrieval systems can achieve interoperability by following guidelines for setting up repositories, and by applying relevant protocols and interoperability standards. Integration of open access resources available from repositories distributed across the globe is the need of the time for the success of open access philosophy. Interoperability is the magic wand that makes this integration possible. In other words, interoperability helps to achieve the goal of open access movement – to increase access, visibility, and impact of publicly funded research activities. Interoperability helps end users to locate required information resources from a unified search interface without knowing location of objects and repository specific retrieval techniques. The success of green path of open access i.e. dissemination of open contents through institutional and subject based repositories directly depends on interoperability. The gold path of open access i.e. open access journals may also be benefited through integration of usage data, citation data, article-level metrics etc on the basis of interoperability.

2.2.1 Types of Interoperability

The IEEE Glossary defines interoperability as “the ability of two or more systems or components to exchange information and to use the information that has been exchanged” (Geraci, 1991). Interoperability, in broader sense, is the ability for systems (including information systems) to communicate with each other and pass information back and forth in a usable format. In open access information systems, interoperability may help us in contents aggregation, data mining, and on-the-fly integration of related resources from different locations in real time, improvement of existing information services and introduction of new information services. Interoperability may fundamentally be grouped into two categories – i) Syntactic interoperability; and ii) Semantic interoperability. In syntactic interoperability, two different systems communicate and exchange data on the basis of standard data formats (e.g. MARC 21 or Dublin Core), standard exchange format (e.g. ISO-2709 or MARC-XML), text-encoding standards (e.g. ASCII, ISCII or Unicode) and communication protocol (e.g. Z 39.50 or OAI/PMH). Semantic interoperability, on the other hand, supports automatic interpretation of information elements on the basis of common information exchange reference model (e.g. integration of two different thesaurus or classification schemes; conversion of bibliographic data available in CCF format into MARC formats on the basis of Crosswalks). However, in the open access domain, COAR (Confederation of Open Access Repositories) identified following major areas of interoperability:

Metadata level interoperability: It refers to integration of metadata from different open access resources into a single-window service on the basis of metadata harvesting protocols and standards like OAI/PMH version 2.0 protocol. This helps to develop subject-specific portals and specialized search engines such as OAIster and BASE (Bielefeld Academic Search Engine).

Content level interoperability: This refers to the facilities of multiple-deposit process where authors submit document in one place and automatically contents are transferred from one system to another. This cross-system contents transfer is supported by protocols like SWORD (Simple Web-service Offering Repository Deposit) for multiple deposit and OA-RJ (Open Access Repository Junction) for managing multi-authored and multi-institutional open knowledge objects. Multiple deposit means simultaneous submission into multiple repositories – author’s own institutional repository (IR), co-authors’ IRs, subject specific repositories, and funder repositories. CRIS-OAR (Current Research Information and Open Access Repositories), on the other hand, aims to support integration of research administration and open access repositories at the institutional level.

Network level interoperability: This supports development of national and regional repository networks on the basis of metadata harvesting. But global de factostandard for metadata harvesting OAI/PMH version 2.0 supports only unqualified Dublin Core metadata. Network level interoperability initiatives aims to layer some essential additional fields (may vary from network to network) on top of OAI/PMH. The DRIVER (Digital Repository Infrastructure Vision for European Research) project of European repository community first applied this model of interoperability which was later followed by OpenAIRE (Open Access Infrastructure Research for Europe) project.

Statistics and usage data level interoperability: Interoperability in usage statistics is emerging as an important area in open access domain. It allows measuring impact of individual open knowledge objects (e.g. research articles) and supports aggregation and exchange of usage information from different repositories and information systems (like CiteSeer). Many protocols and standards are being developed in the area of cross-repository usage statistics like SURE (Statistics on the Usage of Repositories) and PIRUS (Publishers and Institutional Repository Usage Statistics).

Identifier level interoperability: As a library professional, you are aware of the importance of authority data to support collocation of library documents. The same concept is also required for effective organization of open access resources. Like name authority, title authority and subject authority, we need consistency in identification and naming of authors, items, location of items, institutions, funding agencies, grants etc in organizing open access resources. Different standards and systems for unique author identification (e.g. ORCID and AuthorClaim), object identification (e.g. DOI, Handle system, PersID) and dataset identification (e.g. DataCite) are emerging standards and services to support this area of open access interoperability.

Object level interoperability: Open access resources are increasingly becoming multimedia objects. These include different media types (text, audio, video, streaming video etc) and are called compound digital objects. These resources require standards of interoperability for exchange of web resource aggregations. OAI-ORE (Open Archive Initiative – Object Reuse and Exchange) is considered as the de facto global interoperability standard in this area.

Semantic level of interoperability: This refers to meaningful exchange of data at machine-level. A standard such as the Resource Description Framework (RDF) is applied to achieve semantic interoperability in digital domain. RDF, as a greater metadata architecture, helps to express digital objects relationships in a machine understandable way. RDF-enabled open access information systems allows machines to create sophisticated services through integrating knowledge objects distributed across repositories and other systems.

2.2.2 Technical Issues

Open knowledge objects are distributed globally in different open access journals, open access repositories and open datasets. Several digital asset management software including repository management and journal management software, both from the open source domain and commercial domain, appeared in the last ten years or so. Unfortunately, these software developed independently from each other with less emphasis on technologies that support system-level sharing and exchange of digital assets. In view of the panoramic and distributed nature of open access resources, heterogeneity is expected to be the norm. Interoperability is crucial to accommodate intra-system and inter-system data exchange and thereby requires a model to identify essential concepts, axioms and relationships which are independent of specific standards, technologies or implementations. The DL.org project67 has identified six major areas of interoperability (independent of software, systems and standards) on the basis of The DELOS Digital Library Reference Model.

Architecture

The Reference model identified two major technical components to achieve architecture level interoperability – i) component profile and ii) application framework. The first one prescribes that each architectural component must be associated with a profile to describe functionality of the software component. A comprehensive component profile increases possibility of re-using the component by different software systems for different context. This facility also allows other systems to select and integrate software component into its workflow. The application framework prescribes that seamless exchange of information requires standardization of component roles, component-to-component interaction pattern, and component interaction interfaces. There are two major issues with architectural level interoperability68 – content storage (components related to storage of digital knowledge objects) and content access (components deal with access to digital knowledge objects including parts and relations).

Contents

Contents are key resources for digital knowledge management system including open access information systems. The content management workflow (i.e. selecting, digitizing, describing, and digitally curating content resources) is labor-intensive, time-consuming and expensive. Therefore content level interoperability is an important issue in open access domain. The Reference model69 prescribes standardization in following five sectors – i) Information object format (refers to data types to describe the structural properties of digital object); ii) Information object attributes (metadata that describes resources must be comprehensive, structured and granular); iii) Information object context (metadata elements that records the relations with other entities like people, places, moments, time and semantics); iv) Information object provenance (metadata elements that records the process causing the object to be in its current state); v) Information object identifier (standards that uniquely identify and universally refer to the same information object).

Functionality

The technical issue related with the functionality70 refers to all the processing aspects that can occur on resources and activities that can be observed by stakeholders of open digital content management. The Reference model prescribes – i) precise description of functions of each software modules; ii) recording of complementary and mutually dependent functions; iii) re-using of software modules that implement the desired functionality; iv) detailing of functionality profile of a digital assets management system, a digital asset management software and a digital asset management software module along with the associated interfaces.

Policy

The model refers to policy interoperability and policy classification. The policy level71 interoperability helps to achieve integration with third-party service providers, such as data archives and cloud providers. It prescribes standards for – i) encoding of policies for machine discovery (languages of representation); ii) policy management (policy are appraisal and enforcement); iii) evolution of policies over time; and iv) relation between policy and quality.

Quality

This refers to the three most important elements of digital asset management system - quality of contents, quality of services and quality of policies. It aims to investigate interoperability issues that prevent software of the domain from working together from the perspective of quality. Finally, it aims to develop a quality framework72 to support exchange of knowledge objects to achieve the goal of unified resource discovery.

Users

This refers to the Actor of digital asset management system and deals with issues such as user modeling, user profiling, user context, and user management. Till date there is no generally accepted user model that can be used in every software that supports green and gold path of open access. The Reference model identified two areas73 of user level interoperability – i) interoperability of user profile from system to system; and ii) interoperability of usage pattern across the systems.

Last modified: Monday, 5 April 2021, 3:02 PM

Researchers (English)

Librarians (Arabic)

Researchers (Arabic)

Librarians (English)

2.2 Interoperability

2.2.1 Types of Interoperability

2.2.2 Technical Issues