2.3 OA Content Management: Best Practices
The DISC-UK Data Share project funded by JISC developed a guide for content management in digital repositories on the basis of best-practice guidelines developed by premier OA service providers. It divides the activities related to content management into seven broad areas and identified factors responsible for efficient dissemination of OA contents. The areas and factors are given below in brief for yourready reference (please consult the guidebook26 for details).
2.3.1 Content Coverage
A repository must be populated with contents and must be managed in an effective and sustainable way. A content coverage road-map will enable this to happen. The types of contents can range from dissertations and articles, to raw research data and data-sets, post-prints (peer-reviewed research articles), book chapters, working papers, theses etc. One of main characteristics of Green OA is the great diversity of contents in repositories. There is no consensus on content types. Different OA repositories have different contents policy (OpenDOAR, 2013; ROAR, 2013). According to OpenDOAR database, nearly about 82% repositories have not defined content policy, and OA repositories mainly contain textual materials (Figure 3.4).
Obviously content management in OA needs to address issues related with:
a) Scope of OA in terms of subjects and languages. The following questions need to be addressed:
- What subject areas will be included or excluded?
- Are there language considerations?
- Will translations be included or required?
- Will text within data files, metadata or other documentation in other languages/English be translated into English/other languages?
b) Kinds of OA research data. Digital research data varied widely – from texts and numbers to audio and video streams. These data would come from:
- Scientific experiments
- Models and simulations (metadata of model and computational data related to model)
- Observations (surveys, censuses, voting records, field recordings, etc.)
- Derived data (processing or combining ‘raw’ or other data)
- Canonical or reference data (gene sequences, chemical structures etc.)
- Accompanying material
c) Status of the OA research data. It deals with the decisions related to status of data (the research process/life-cycle) to be included in a repository such as:
- ‘Raw’ or preliminary data?
- Data that is ready for use by designated users;
- Data that is ready for full release;
- Summary/tabular data; and
- ‘Derived’ data
d) Versions of OA resources. Cross-referencing and version controlling is an important aspect of OA content management. It needs to deal with the following tasks:
- Controlling explicit version numbers as reflected in dataset names;
- Recording version and status of OA resources (draft, interim, final, internal);
- Storing multiple copies of a dataset in different formats;
- Keeping the original copies of data and documentation as deposited;
- Storing supplemental digital objects with the data;
- Recording relationships between items (such as ‘supersedes’ or ‘is superseded by’);
- Linking earlier version with later version (identification of the most recent version);
- Ensuring version controlling for different copies of files or materials in different formats; and
- Associating persistent identifier to the latest version
e) Data file formats for OA resources. Selection of the most appropriate file formats for different OA objects and approval of acceptable data file formats from submitters of OA resources is a major area of content management for OA service providers. The conversion of one format into another format is also a mandatory function of OA content management. The following issues need to be addressed:
- Should ASCII files be accompanied by data submitted to system?
- Should spreadsheet files be converted to tab or comma-delimited text?
- Should system accept only file formats that are de facto standards?
- Should system allow that specific file formats to be converted into data formats that remain readable and usable?
- Should system accept only ‘open’ non-proprietary, well-documented file formats wherever possible?
- Should system accept compression formats? (E.g. tar, gzip, zip etc)
- Should system convert proprietary formats to non-proprietary formats?
- Should system create plain text versions of datasets (encoded in either ASCII or Unicode character sets)?
- Should system retain the original bit stream (file) with the item, in addition to its converted formats? and
- Should system accept formats for the purposes of transfer, storage and distribution to users, which do not meet the conditions of long term access?
f) Volume and size limitations for OA resources. Efficient storage space maintenance is another important task of OA content management. It deals with restrictions on the number of files per submission or overall size of the deposited files by contributors. The following factors need to be taken care of:
- Should system restrict OA submission by the number of bytes, or number of separate files, or other conditions?
- Should system use compression software to bundle multiple files (e.g. zip files)?
- Should system apply Storage Area Network (SAN) that supports disk mirroring, backup and restore, archival and retrieval of archived data, data migration from one storage device to another, and the sharing of data among different servers in a network? and
- Should system use Storage Resource Broker (SRB) as a data grid application?
You may also consult following guidelines in managing data formats and data volume:
- PRONOM, an on-line information system about data file formats
- Global Digital Format Registry (GDFR)
- JHOVE (JSTOR/Harvard Object Validation Environment
- DROID (Digital Record Object Identification
- Edinburgh Compute Data Facility (ECDF)
- SRB applications in Fedora and Dspace
2.3.2 Content Metadata
Metadata is a crafty area of managing digital archive of any type or size. OA retrieval systems are no exceptions. The Digital Library Foundation (DLF), a coalition of 15 major research libraries, defines three types of metadata which can apply to objects in a digital archive – descriptive metadata, administrative metadata and structural metadata. OA content management system should apply appropriate standards in each of these three areas to ensure adequate description and long term preservation. Descriptive metadata is important for end users to perform retrieval tasks, like searching, browsing, navigating and collocating OA resources. Administrative metadata is used by OA content managers for maintaining the OA collection, and Structural metadata is generally used by software (at the interface) to compile individual digital objects into more meaningful units. You may refer to Unit 1 of Module 4 for a detail discussion on resource description through metadata applications. However, from the content management point of view following factors need to be considered:
1. Access to metadata
- Should system allow anyone to access the metadata free of charge?
- Should system restrict access to some or all of the metadata?
2. Reuse of metadata
- Should system allow metadata be reused in another medium without prior permission, provided there is a link to the original metadata and/or the repository is mentioned?
- Should system allow reusing the metadata for commercial purposes?
- Should system ask for formal permission for metadata reuse?
- Should system allow metadata harvesting of dataset descriptions by other institutions on the basis of OAI/PMH or OAI/ORE?
- Should system determine level of metadata reuse (dataset descriptions or full descriptive metadata)?
3. Metadata types and sources
- What descriptive metadata elements should be in use for describing the intellectual content of the object?
- What administrative metadata elements should be included to allow a repository to manage the object (scan format, storage format, technical metadata, copyright and licensing information, preservation metadata)?
- What structural metadata should be adopted that help to ensure ties aggregation of digital objects to make up logical units?
4. Metadata schemas
No single metadata element set can satisfy the functional requirements of different types of resources, organizational requirements or communities of practice. A generic metadata schema is not sufficient enough to describe different type of resources with all relevant elements. In OA landscape, journal articles are possibly the most visible objects but other resource types like learning objects, ETDs (Electronic Thesis and Dissertations), research datasets etc are coming in a big way. Therefore, content managers may need to put in place additional metadata schemas to support the Ingest, management, and use of data in OA collections. For an illustrative list of popular domain-specific metadata schemas, section 4.1.5 of Unit 1, Module 4 may be referred to.
You may consult following guidelines in managing OA metadata:
- UK Metadata Guidelines for Open Access Repositories (2013) in its document entitled “Phase 1: Core Metadata (Version 0.9)”
- OpenAIRE Guidelines (OpenAIRE project34)
- Vocabularies for OA (V40A): An initiative of JISC/UKOLN to develop vocabulary control devices, category lists and authority files for OA resources
- RIOXX: Developing Repository Metadata Guidelines36, an initiative to define a standard set of bibliographic metadata for UK Institutional Repositories
- Linked Content Coalition37, an initiative to develop rights managements metadata for OA resources
- NISO Specification for Open Access Metadata and Indicators38, a NISO initiative to develop standard metadata set specifically meant for OA resources
2.3.3 Content Ingest
Submission of metadata and objects into OA system is technically called Ingest. Most of the repository management software includes Ingestion process as a module of the system. OAIS reference model includes Ingest as functional entity. As prescribed by this model, OA content management helps ingestion through services and functions that accept Submission Information Packages from contributors, prepares Archival Information Packages for storage, and ensures that Archival Information Packages and their supporting Descriptive Information become established within the OA system. However major issues related with OA content ingestion are –
- Eligible depositors
- Should system restrict eligibility by status? If yes, who are eligible for deposition - e-people (registered members), academic staff, registered students, employees of the institution, department, subject community or delegated agents, data producers or their representatives (‘self deposit’) or only repository staff?
- Should system restrict eligibility by content (such as, may only deposit their own work);
- Must enter descriptive metadata for deposited items; limited to depositing datasets as defined by the repository; may only deposit data of a certain type or subject)?
- Should system provide a confirmation of receipt to the depositor for submitted item?
- Moderation by repository
- Should content manager review items (for - eligibility ofauthors/depositors; relevance to the scope of the repository; validformats; exclusion of spam)?
- Should system check to ensure that data integrity has been fullymaintained during the transfer process?
- Should system check metadata records for accuracy?
- Should system implement Digital Object Identifiers (DOIs) or anotherpersistent identifier, such as the Handle system?
- Data quality requirements
Responsibility: Generally contributors are responsible for the quality of the digital research data. OA content management system is responsible for the storage quality and data availability. OA system accepts no responsibility for mistakes, omissions, or legal infringements for the deposited objects. OA system may provide licenses to depositors to cover the range of requirements for reuse of the data.
Quality assessment: Sometimes OA system may evaluate data quality for content inclusion on the basis of following parameters:
- Are the research data based on work performed by the data producer?
- Does the data producer have a record of academic merit?
- Was data collection or digitization carried out in accordance withprevailing criteria in the research discipline?
- Are the research data useful for certain types of research and suitable forreuse?
- Confidentiality and disclosure - This area of OA content management is guided by DANS (Data Archiving and Networked Services, The Netherlands). DANS provides Data Seal of Approval that contains guidelines for applying and checking quality aspects of the creation, storage and (re)use of digital research data in the social sciences and humanities. These guidelines serve as a basis for granting a “data seal of approval” (DANS, 2008).
- Embargo status - OA content management system should provide agreements about the embargo that include length of embargo and condition that ends embargo on an OA object. The following issues need to be addressed:
- Should system allow embargo status and length of embargo is determined by OA content manager or by contributors?
- Should system allow a mechanism where the metadata is publicly accessible but the data are embargoed or restricted in some way?
- Should system allow to automatically releasing the data on the end date of the embargo or should system manually manage embargo?
- Rights and ownership - OA content management must enter into license agreement with the depositor upon submission of OA resource through an in-built or click-through Depositor Agreement. The agreements should at least have three parts –rights of the OA system (Repository), rights of contributors (Depositor) and copyrights.
Repository rights: The issues to be considered for repository rights are
- Can repository change file format suitable for long-term preservation or otherwise?
- Is the repository free to change the original submitted material for preservation?
- Can the repository translate, copy or re-arrange datasets to ensure their future preservation and accessibility, and keep copies of datasets for security and back-up?
- Can the repository migrate datasets to another repository?
- Can the repository incorporate metadata or documentation into public access catalogues for the datasets it holds?
- Will the repository be under any obligation to reproduce, transmit, broadcast or display a dataset in the same format or software as that in which it was originally created?
- While every care will be taken to preserve the dataset, will the repository be liable for loss or damage to the dataset?
Depositors' rights: The OA content management system should take into consideration the issues like
- Do depositors retain the right to deposit the item elsewhere in its present or future version(s)?
- Can depositor place embargo on items submitted to OA system?
- Can depositor withdraw items from OA system?
- Can depositor edit metadata of submitted objects?
Copyrights: An OA content management system should ensure following issues (illustrative not comprehensive) related with Intellectual Property Rights (IPR):
- Content of deposited dataset does not breach any law and does not infringe the copyright of any other person;
- Any copyright violations are entirely the responsibility of the authors/depositors; In case of copyright violation the relevant item will be removed immediately from OA system;
- System shall not take legal action on a depositor’s behalf in the event of breach of intellectual property rights or any other right in the material deposited;
- Depositors retain all moral rights to the work including the right to be acknowledged
You may consult following sources for ready reference on the above topic:
- Edinburgh DataShare repository
- Open Data Foundation
- Open Knowledge Foundation
2.3.4 Content Access and Reuse
OA content management system sometimes requires restrictions on use and reuse of OA resources for example, registration in systems to access OA resources, signing a license in downloading OA resources, acknowledgement in adopting and adapting OA resources etc. In most of the cases following three types licenses are in use - Creative Commons; Science Commons; and Open Data Commons. The following aspects relating to access reuse of data and tracking users are to be kept in view:
1. Access to data objects: Following managerial aspects of access to OA need to be considered:
- What should be the level of access to OA - at the institutional/departmental level, user registration level, or at the data set level?
- Should there be a fit-to-all access tag or should datasets be individually tagged with different rights, permissions, and/or conditions?
- Should system need to confirm users' acceptance of the terms and conditions of access?
- What should be the data access method(s) in the system - link to download entire data files? Batch mode access to data? Query-based access to contents?
- Should users allow to comment or rating OA objects or submit reviews?
- Should system be integrated with visualization and mapping applications or tools?
- Should system adopt collaborative, participative and interactive architecture?
- Whether or not the reuse of OA contents (including datasets) be limited?
- What are the possible limitations (if any) - limitation to non-commercial usages, prohibition to modify data, or other constraints on their redistribution or modification.
- Whether or not OA system can lift restriction (if any) on a case-by-casebasis?
- What attribution(s) of CC license(s) be adopted by OA system?
- What is/are the condition(s) that allow redistribution of OA contents at the user end?
- Will users of the data be required or requested to cite the data set/s? If yes, what should be the minimum bibliographic data elements?
- Will there be any restriction on making copies of the data and accompanying materials?
- Will OA system allow harvesting of full-text or metadata for citation analysis?
3. Tracking users and use statistics: Recoding or tracking user behavior inOA system is useful for planning and improving the system as a whole andat the same time the issue is controversial in nature. Therefore OA contentmanagement must be judicious in decision taking. The considerations maybe concentrated on:
- Should OA system track use patterns of individual users through loganalysis?
- What granularity level is required to allow the identification ofindividual users and their usage pattern?
- Should OA system adopt policy to determine that to whom and to whatextent OA statistics be exposed?
2.3.5 Content Preservation
Content preservation is extremely important to support continuous OA services. As per the guidelines following four factors are important:
1.Retention period: OA system should have managerial policies for the following issues in relation to retention period:
- Whether or not OA contents be retained indefinitely?
- What should the minimum period of retention?
- Should all items will be retained for the lifetime of the repository or retention periods be set for individual items?
2.Functional preservation: Functional preservation solely depends on File Format standard selection to get rid of rapid technical obsolescence of content bit streams. OA system should have mechanisms to ensure usability of OA contents through specific file format support.
3.File preservation: As you know already from previous sections, selection of file format for OA contents and mechanism to convert one file format into another are the two ways to ensure functional preservation. An OA content management should consider following factors in this direction:
- Whether or not OA system support various file formats?
- What file formats should be adopted for different types of bit streams?
- What is the plan and processes for migrations or transformation at the time of need?
- Whether or not OA system should support encryption or compression for archival files?
- What are the plans and procedures for back-up and restoration of OA contents?
- What should be the policy, plan and process for file format migration?
4.Fixity and authenticity: Fixity means a checking on integrity and authenticity of the digital objects. OA content management system must have fixity mechanisms to validate the authenticity of information extracted from a digital object. Fixity mechanisms (such as checksums, message digests, and digital signatures) are used to verify content level integrity during submission, downloading and file transfer. Fixity may be determined at various levels such as - at the points of creation, accession, ingest, transformation dissemination.
- Activity I Check Fixity issues at PARADIGM PROJECT. (2007) .Metadata for Authenticity: Hash Functions and Digital Signatures. Universities of Oxford and Manchester. Available from: http://www.paradigm.ac.uk/workbook/metadata/authenticity.html
2.3.6 Content Withdrawal
Sometime an OA system needs to withdraw contents from a production system. This requires managerial considerations for the following factors:
a)Whether or not items be removed from the repository?
b)What conditions repository should choose to remove items?
c)What are the reasons for withdrawal by repository (copyright violation, legal requirements and proven violations, national security, falsified research, confidentiality concerns etc.)?
d)Should items be removed at the request of the depositor?
e)What should be the terms of the withdrawn items - withdrawn items are deleted entirely from the database; withdrawn items are not deleted, but are removed from display; ‘tombstone’ citations made available to avoid broken links?
f)What to do with the metadata for withdrawn items; metadata of withdrawn items will / will not be searchable?
2.3.7 Sustainable Development
Confederation of Open Access Repositories (COAR), an active OA promotion agency (with membership of over 100 institutions worldwide from 35 countries and 4 continents) has mission to enhance the visibility through global networks of Open Access repositories. COAR published a guide in June 2013 entitled Incentives, Integration, and Mediation: Sustainable Practices for Populating Repositories45. This guide advocated eight measures in sustainable OA content management to achieve goals of an OA system. The measures are as follows:
- Advocacy: It means promotion of open access at institutional level even for those institutions which have OA policies and mandate;
- Institutional Mandates: It means that an institute may make it mandatory for faculty and affiliated researchers to deposit peer-reviewed, scholarly articles published by authors into their institution’s open access repository;
- Metrics: It is generally observed that usage statistics supplied by repository services can act as a strong incentive for researchers to contribute into OA repositories;
- Recruitment and Deposit Services: Content recruitment services like rights checking, and depositing on behalf of authors can be an effective way of populating repositories;
- Researcher Biographies: Integration of faculty members / researcher biographies with OA repositories (in order to link the citations with full text content in the repository) can be a successful strategy for populating the repository;
- Research Information Systems: Integration of research monitoring system with institutional repository (such as CRIS and DSpace integration (see Unit 2 of Module 4 for details) can be useful for OA content management system;
- Publisher Agreements: Orientation services on publisher policies (for example, use of tools like SHERPA/RoMEo in terms of whether, when, and what version authors are allowed to be deposited as OA) can reduce confusion at the contributor's level; and
- Direct Deposit: Integration of direct deposit service, which transfers articles directly from the publisher into the institutional repository, may be very useful for OA content management (such as integrating DSpace with OJS via SWORD protocol).