1.5 Domain-Specific Metadata Schemas
DCMES consists of 15 basic elements only. These fields, being generic for any type of digital resource, do not capture any specific information about specialized contents such as maps, images, video objects, learning materials, ETDs etc. Although DC attributes such as authors, title, subject etc are definitely useful for specialized OA contents (other than journal articles) like learning objects and ETDs but at the same time DCMES does not contain attributes to describe essential attributes of specialized contents (like name of degree awarded for dissertations or learning outcome in case of learning objects). In other words, no single metadata element set will accommodate the functional requirements of all organizations or communities of practice. A generic metadata schema is not sufficient enough to describe different type of resources with all relevant elements. In OA landscape, journal articles are possibly the most visible objects and next come learning objects and ETDs. Open learning resources are increasingly available in different forms and formats (such as Moocs). An analysis of OpenDOAR shows that many repositories include ETDs as contents and some repositories are exclusively dealing with ETDs. This section therefore covers primarily learning objects and ETDs.
1.5.1 Learning Objects Domain
Learning objects are digital educational materials with pedagogical perspective. Reuse of learning materials is highly desirable and it is ensured through semantically tagging them with standard metadata. Efficient retrieval of learning materials requires applying domain-specific schema to describe educational attributes such as topic of the document, type of the document etc. In order to cope with educational concerns, various metadata standards have been developed namely IMS Metadata, SCORM, CanCore, GEM and IEEE Learning Object Metadata, AICC, ARIADNE etc. Here we will be discussing in brief three major learning object metadata schemas. Three major schemas are finally compared with three other schemas to provide you an insight of comprehensiveness of schemas.
IEEE Learning Object Metadata (IEEE - LOM)
The Learning Objects Meta data schema was published by the Institute of Electrical and Electronics Engineers (IEEE) in 2002. The IEEE Learning Object Metadata2 aims to develop technical standards, recommended practices, and guides for learning technology. The LOM standard is mainly built on the Dublin Core and is based on the recommendations of IMS and ARIADNE project. It is a multi-part standard and contains a description of semantics, vocabulary, and extensions. LOM has a wide set of globally agreed metadata elements which are grouped into nine descriptive categories: General, Life cycle, Meta metadata, Technical, Educational, Rights, Relation, Annotation, and Classification. The LOM data model is a hierarchy of data elements, including aggregate data elements and simple data elements. It specifies a conceptual data schema that defines the structure of a metadata instance for a learning object. It is intended to reference by other standards that define the implementation descriptions of the data schema and thereby ensures reuse and exchange learning objects. The purpose of the IEEE LOM is to facilitate acquisition, search, evaluation and use of learning objects. It is intended to facilitate the sharing and exchange of learning objects by enabling the development of catalogs and inventories while taking into account the diversity of cultural and lingual contexts in which the learning objects and their metadata are reused (IEEE, 2013).
The IMS Global Learning Consortium3 has developed and promotes the adoption of open technical specifications for interoperable learning technology. IMS is based on LOM and Dublin Core metadata. The IMS Global Learning Consortium, Inc. (IMS) project was launched by EDUCAUSE (formerly EDUCOM), a consortium of North American educational institutions and their commercial and government partners to define open technical standards for the interoperation of distributed learning applications and services (Anido et al., 2002). IMS develops and promotes open specifications for facilitating online distributed learning activities such as locating and using educational contents, tracking learner progress, reporting learner performance, and exchanging student records between administrative systems (IMS, 2003). IMS is very attentive to the needs of those in the educational community generally and has the highest recognition within this community of the standards development organizations (Friesen, 2002). The IMS Content Packaging Information Model defines a standardized set of structure that can be used to exchange the learning contents. These structures provide the basis for standardized data bindings that allow the software developers and the implementers to create instructional materials that are interoperable across authoring tools, learning management systems, and run time environments. IMS has two fundamental goals: to define specific guidelines which guarantee interoperability between applications and services in e-learning; and to support guidelines application in international products and services.
SCORM (Sharable Content Object Reference Model)
SCORM was developed in 2003 by an organization called Advanced Distributed Learning (ADL). The SCORM Metadata Application Profile directly references the IEEE Learning Object Metadata (LOM) standard. It provides specific guidance for applying metadata to learning resources.SCORM is globally accepted as the standard for management of educational contents. It is a collection of specifications adapted from multiple sources to provide a comprehensive suite of e-learning capabilities that enable interoperability, accessibility and reusability of Web-based learning contents. The SCORM-compliant courses are reusable, accessible, interoperable and durable. It is a model that references and integrates a set of interrelated technical standards, specifications and guidelines designed to meet ADL’s functional requirements, such as, accessibility, interoperability, durability and reusability for learning contents and systems.
1.5.2 Theses and Dissertations
This section covers three comprehensive metadata schemas in the domain of electronic theses and dissertations (ETD) namely ETD-MS, UK-ETD, and Shodhganga (mainly used in Indian universities).
NDLTD is the developer of ETD-MS. The initial goal of NDLTD was to develop a standard XML DTD for encoding metadata elements for ETDs. ETDMS is based on the Dublin Core Element Set, but includes an additional element specific to metadata regarding theses and dissertations. Despite its name, ETDMS is designed to deal with metadata associated with both paper and electronic theses and dissertations. It is also designed to handle metadata in many languages, including metadata regarding a single work that has been recorded in different languages.
This metadata standard is recommended by Electronic Theses Online Service (EThOS), UK. EThOS is the Electronic Theses Online System which allows individuals to find access and archive doctoral e-theses that are produced in UK Higher Education institutions. Funding from the Joint Information Systems Committee (JISC) enabled three project teams in the UK to study the issues and challenges associated with the deposit and management of theses in electronic format. It was considered important to recommend a standard set of metadata elements to describe the contents of e-theses repositories.The schema conforms to the guidelines for implementing Dublin Core in XML.
The Indian ETD repository called Shodhganga (maintained by INFLIBNET, an Inter University consortium under University Grants Commission, India) originated to facilitate open access to theses amongst the academic community. The word ‘shodh’ originates from Sanskrit and means research and discovery. Ganga is the name of the largest and holiest river in India. This project was intended to provide online accessibility to Indian theses for archiving and free access. The Shodhganga metadata schema has been developed as domain-specific schema to deal with ETDs of Indian universities. Shodhganga uses the qualified Dublin core set of elements for furnishing metadata in order to provide global access of Indian research outputs. The basic DC sets consists of 15 elements and the qualified set has about 31 elements in Shodhganga. A comparison of these three schemas against a set of carefully crafted parameters may help to assess quality and comprehensiveness of these schemas.
1.5.3 Other Domains
An illustrative list of popular domain-specific metadata schemas are given here in alphabetical order:
- ABCD - Access to Biological Collection Data: An evolving comprehensive standard for the access to and exchange of data about specimens and observations (a.k.a. primary biodiversity data) sponsored by Biodiversity Information Standards TDWG - the Taxonomic Databases Working Group (last modified in 2007).
- AGLS (Australian Government Locator Service): AGLS is an Australian government metadata standard intended for the description of government resources on the Web. It uses DCMI Terms properties with a few additional metadata elements such as function and mandate.
- AgMES - Agricultural Metadata Element Set: AgMES, developed by the Food and Agriculture Organization (FAO) of the United Nations enables description, resource discovery, interoperability and data exchange of different types of information resources in all areas relevant to food production, nutrition and rural development (last modified in 2010).
- CanCore : CanCore is a set of guidelines for the implementation of the IEEE LOM metadata standard for describing learning resources. It is originated in Canada for managing learning objects in Canadian universities.
- CSMD-CCLRC (Core Scientific Metadata Model): It is designed by Science and Technologies Facilities Council to support data collected within a large-scale facility’s scientific workflow but the model is also designed to be generic across scientific disciplines (last modified in 2011).
- Cataloguing Cultural Objects (CCO): A schema for cultural objects, developed by the US-based Visual Resources Association with significant input from the Getty Research Institute (last modified in 2010).
- Categories for the Description of Works of Art (CDWA): An extensive metadata schema for cataloguing objects held by art museums developed in the US in the 1990s by the Getty Research Institute (last modified in 2010).
- Darwin Core: A metadata schema developed Biodiversity Information Standards (TWDG) to cover elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity (last modified in 2009).
- DataCite Metadata Schema: A set of mandatory metadata elements prescribed by DataCite consortium to support persistent approach to access, identification, sharing, and re-use of digital research datasets (last modified in 2013).
- DDI - Data Documentation Initiative: A globally recognized standard for describing data from the social, behavioral, and economics and statistics. The XML based DDI metadata specification supports the entire research data life cycle (last modified in 2009).
- DIF - Directory Interchange Format: A domain-specific schema for Earth sciences community, intended for the description of scientific data sets. It includes elements focusing on instruments that capture data, temporal and spatial characteristics of the data (last modified in 2010).
- e-GMS: A schema dedicated to e-governance developed in UK for describing information resources to ensure maximum consistency of metadata across public sector organizations in the UK.
- Encoded Archival Description (EAD): A well-known schema that provides an encoding for archival descriptions. It adopts a multi-level approach to description, providing information about a collection as a whole and then breaking it down into groups, series and (if significant) individual items, grew out of work done at UC Berkeley in the mid 1990s and was influenced by TEI and ISAD(G) (last modified in 2002).
- EXIF (Exchangeable Image File Format): A technical metadata standard that can be written to and read from a still image file itself (and formats). It was developed by JEITA (Japan Electronics and Information Technology Industries Association).
- FGDC/CSDGM - Federal Geographic Data Committee Content Standard for Digital Geospatial Metadata: A widely-used, schema for digital geospatial data required by the US Federal Government. It is sponsored by the US Federal Geographic Data Committee (last modified in 2010).
- FOAF (Friend of a Friend): FOAF is a RDF-enabled schema for describing people and intended to be used on the Semantic Web. It includes features for encoding names, email addresses, personal interests, home pages, and various online identities. In future traditional library authority files may be translated into FOAF but it needs to settle two very important issues – i) each individual has only one FOAF identity; and ii) FOAF focuses on online presence for current living persons.
- Genome Metadata: A schema dedicated to the field of Genomics. It consists of 61 different metadata fields covering broad categories: Organism Info, Isolate Info, Host Info, Sequence Info, Phenotype Info, Project Info, and Others (last modified in 2009).
- GEM (Gateway to Educational Materials): GEM is an RDF-enabled metadata vocabulary designed for the description of educational resources. The GEM model includes all the properties available in DCMI Terms, with a few additional education-specific elements such as educational standards and pedagogical methods.
- GILS: Global Information Locator Service or GILS is a schema for governments, companies, or other organizations to support citizen/customer facing information services. GILS was an early metadata standard for the encoding of descriptive information for government records
- International Virtual Observatory Alliance Technical Specifications: A schema for astronomical objects developed by the IVOA (International Virtual Observatory Alliance) to enable interoperability between and the integration of astronomical archives across the world into an international virtual observatory (last modified in 2009).
- ISO 19115: An internationally-adopted schema for describing GIS (geographic information and services). It provides information about the identification, the extent, the quality, the spatial and temporal schema, spatial reference, and distribution of digital geographic data (last modified in 2009).
- MathML (Mathematical Markup Language): MathML is a W3C Recommendation for the low-level encoding of mathematical information (mathematical data and for the content of the mathematical data) with the intention of representing this information on the Web.
- MIDAS: It is a UK standard for describing cultural heritage assets that form the historic environment (buildings, archaeological sites, shipwrecks, areas of interest, artifacts and ecofacts).
- MIX: It is an XML based schema for encoding the Technical Metadata for Digital Still Images standard, developed by NISO group on Metadata for Images in XML ((last modified in 2009).
- NewsML (News Markup Language): The NewsML aims to design a complex schema for describing textual news, articles, photos, graphics, audio, and video — the components that make up or express news items.
- OAI-ORE (Open Archives Initiative Object Reuse and Exchange: A W3C standard for managing rich content in aggregations of Web resources and supporting activities like authoring, deposit, exchange, visualization, reuse, and preservation (last modified in 2011).
- OAIS (Open Archival Information System): OAIS is a “reference model” schema to support preservation of digital information. OAIS includes three subcategories – i) Submission Information Package (SIP) to support the content and metadata received from a preservation repository; ii) Archival Information Package (AIP) to support content and metadata managed by a preservation repository; iii) Dissemination Information Package (DIP) to support end user in response to a request, and may contain content spanning multiple AIPs. OAIS-compliant repository software supports a certain level of functionality and standardization of features.
- ONIX: A schema developed by book industry to support Online Information Exchange - international standard for representing and communicating book industry product information in electronic form.
- PBCore: Public Broadcasting Metadata Dictionary or PBCore is intended for use by television, radio and web broadcasters and hopes to describe and retrieve broadcast contents efficiently (last modified in 2011).
- PREMIS: A technical metadata schema that provides a "dictionary" of core metadata elements that can be used to support the digital preservation of a resource. A key feature of the PREMIS model is the definition of Objects as made up of Representations, Files, and Bit streams. It was particularly influenced by a conceptual model called the Open Archival Information System. The Library of Congress is the official PREMIS maintenance agency (last modified in 2006).
- SPECTRUM: A key UK standard for museum documentation (last modified in 2005).
- SDMX - Statistical Data and Metadata Exchange: A set of common technical and statistical standards and guidelines to be used for the efficient exchange and sharing of statistical data and metadata (last modified in 2012).
- SKOS (Simple Knowledge Organization System): SKOS is a W3C standard for encoding structured vocabularies in RDF. The RDF SKOS vocabulary focuses on describing concepts, which are represented by terms, and documenting relationships between concepts.
- SWAP (Scholarly Works Application Profile): SWAP is a DCMI-compliant application profile for the description of scholarly works, developed by UKOLN. It aims to support quality metadata encoding of knowledge objects in Green OA. SWAP is based on the FRBR conceptual model, and therefore differentiates between Works and their Manifestations.
- Text Encoding Initiative (TEI) Header: It is a scheme for marking up electronic text. It also specifies a header portion to accommodate metadata about the object to be described. TEI headers can be used to record bibliographic information of both electronic and non-electronic sources. The TEI header can be mapped to and from MARC.
- VRACore (Visual Resources Association Core Categories): A widely used metadata schema for describing art or cultural images, providing 17 core categories (last modified in 2007).
- XrML (eXtensible Rights Markup Language): XrML is an XML language for the encoding of rights information. It is focused on the action of “granting” authorizations between Principals, Rights, Resources, and Conditions.