1.2 Resource Description
Metadata, in general, is referred to as data about data, and provides basic information such as the author of a work, the date of creation, links to any related works, etc. Metadata exists for almost every conceivable object or group of objects, whether stored in electronic form or not. In the library world, one easily identifiable form of metadata is the card catalogue; the information on the card is metadata about a book. In a traditional library, where cataloguing is the work of trained professionals, complex metadata schemes such as MARC, CCF etc. are used for description of library resources. As a library professional you know the application of metadata in the form of cataloguing. There are strong similarities between traditional library cataloguing and the description of web resources by using a set of metadata. Modern cataloguing theory and practice developed over the last 150 years or so as a tool for organizing information for retrieval in the libraries. Library catalogue typically consist of a collection of bibliographic records that describe library resources such as printed books, cartographic materials, music scores, manuscripts, etc that aim to describe the different types of resources of a library. Gradually the scope of cataloguing codes and resource description standards have expanded to include a range of newer publishing media such as sound recordings, microfilms, video recordings, films, computer files and Web resources. For such descriptions different standards and standard procedures have been developed from time to time to facilitate recording and access of the resources. Open access materials are also no exception. For example, when users retrieve journal metadata from DOAJ (Directory of Open Access Journal), one of the important elements of description is APC (i.e. Article Processing Charge). This metadata element helps contributors in selecting appropriate journal(s) for publication of research results. Another related metadata is the date from which content is available as Open Access. This metadata elements help users in selecting appropriate resources from journals which started in close mode and subsequently available in open mode. With the rise of Internet and the Web as global publishing media, the term metadata began to appear in the context of describing information objects on the network. Library professionals were quick to realize that they had been creating data about data, in the form of cataloguing over the last one hundred fifty years, since the time of Panizzi. However, there is inconsistent use of the term ‘metadata’ even within the library community. Some are using it to refer to the description of both digital and non-digital resources, and others restricting the term to the description of electronic resources. For example, definitions given by IFLA (International Federation of Library Associations and Institutions) and W3C (World Wide Consortium) are restrictive in nature. IFLA defines metadata as “The term refers to any data used to aid the identification, description and location of networked electronic resources” (IFLA, 2002). According to W3C “Metadata is the machine understandable information for the Web” (W3C, 2003). In contrast, definitions given by Getty Research Institute (GRI) and UKOLN (U.K. Office for Library and Information Networking) are fairly liberal. GRI says metadata is “Data associated with either an information system or an information object for purposes of description, administration, legal requirements, technical functionality, use and usage, and preservation” (Murtha, 2002). Similarly UKOLN says, “Metadata is normally understood to mean structured data about digital (and non-digital) resources that can be used to help and support a wide range of operations. These might include, for example, resource description and discovery, the management of information resources (including rights management) and their long-term preservation” (UKOLN, 2002). For the purpose of this unit, a liberal stand in terms of the definition and scope of the term metadata is taken. Metadata is used here to mean structured information about an information resource of any media type or format. Metadata by definition is descriptive of something, but many different use of metadata has led to the construction of a very broad typology of metadata as being descriptive, administrative and structural (Hadge, 2001):
- Descriptive metadata is meant to serve the purposes of discovery (i.e.how one can find a resource), identification (i.e. how a resource can be distinguished from other similar resources), selection (i.e. how to determine that a resource fills a particular need), collocation (bringing together related works), obtain (obtaining a copy of resource, or access to one) and other related functions (evaluation, linkage and usability).
- Administrative metadata is information intended to facilitate the management of resources such as date of creation, rights and restrictions of access and archiving, control or processing activities etc.
- Structural metadata is concerned with recording of relationships that holds compound digital objects together.
Metadata schemas are set of metadata elements and rules for their use that have been defined for a particular purpose. A metadata schema specifies three independent but related aspects of metadata – semantics, content rules and syntax:
- Semantics refers to the metadata elements that are included in the schema by giving each of them a name and definition. A metadata schema also specifies whether each element is mandatory, optional or conditionally required and whether the element may or may not be repeated.
- Content rules indicate how values for metadata elements are selected and represented. For example, semantics of a metadata schema may define the element “author” but the content rules would specify which agents qualify as author (selection) and how an author’s name should be recorded(representation).
- Syntax of a metadata schema is concerned with the encoding of metadata elements in machine-readable form. Syntax also specifies the way of transmission, transport and communication of metadata between different systems.
Based on their applications, metadata schemas can be grouped into two types –generic and domain-specific. Generic metadata schemas are intended to be generally applicable to all types of resources (e.g., Dublin Core Metadata Elements Set), whereas, domain-specific metadata schemas are primarily designed to describe items related to a particular category (e.g. VRA [Visual Resource Association] Core for visual resource collection, FGDC (Federal Geographic Data Committee) metadata schema for geospatial data etc.). All of these metadata schemas contain descriptive metadata elements, administrative metadata elements, structural metadata elements (Semantics), content rules for metadata representation and syntax for machine-readable metadata encoding. The nature of contents for different categories of metadata elements in schemas are briefly discussed below:
Descriptive metadata elements
- Bibliographic description (such as Dublin Core, MODS, MARC21,MARCXML, ONIX schemas for metadata representation);
- Content description (such as DDI, SDMX, FGDC, EAD, TEI etc.);
- Description of structure, context and source of the data; information about the methods, instruments, and techniques used in the creation or collection of the data;
- References and links to publications pertaining to the data; and
- Information on how the data have been processed prior to submission to the repository.
Administrative metadata elements
- Preservation metadata to represent lifecycle of the data, recording of eventsrelated to submission, curation and dissemination (such as PREMIS) andevent history data (for linking with digital objects) ;
- Rights management metadata;
- Technical metadata (storage format etc.); and
- Representation Information (internal coding, rendering data etc).
Structural metadata elements
Structural metadata indicates relationships amongst different components of a set of associated data that are particularly important for Web aggregation. These aggregations are also called compound digital objects. These digital objects combine distributed resources with multiple media types including text, images, data and video. There are standards for the description and exchange of aggregations of Web resources such as
- FOXML (Standard in use for Fedora repository software, where compound objects are treated as a single file);
- OAI-ORE (An OAI initiative that defines compound objects distributed on the Internet through the creation of resource maps which use unique URLs for each component; It has four basic components i) Resource (an item of interest); ii) URI (a global resource identifier); iii) Representation (a DataStream accessible through URI by using a protocol like HTTP ); and iv) Link (a connection between two resources);
- METS (An LoC standard that is used as a ‘wrapper’ for compound digital objects and very useful for import/export in repositories); and
- RDF (A W3C standard that provides a simple way to represent Web resources, in the form of subject-predicate-object expressions that relate objects to one another).
Why Metadata is important in Open Access?
The core function of a library is to deliver the right contents to users at the right time. In the context of Open Access (OA), metadata plays a crucial role to fulfill this core function. A logical question is possibly coming to your mind that why metadata is so important for disseminating OA resources. The answer is simple one. Apart for supporting all the elements necessary for discovering resources effectively, metadata in OA has additional role to inform the status of a piece of content as open access. If the status of a scholarly object as open access is not obvious it may lead to confusion for end users in assessing access rights and extent of permissions related to a knowledge object. Metadata in the context of OA is important for both library professionals and end users. It helps librarians in data mining, pattern identification (organization and usage), and clarity over licensing agreements, discovering of OA, and accessing open access contents within hybrid journals. On the other hand, metadata helps end user in finding and accessing OA contents, in setting priority of OA contents over paid contents (filtering of results by OA status), in knowing access and re-use permissions, and in getting help to cite OA resources.