Skip to content

Requirement on OAI-PMH format for DataCite is poorly specified #43

@paulmillar

Description

@paulmillar

The Use of OAI-PMH page describes how OpenAIRE uses OAI-PMH to harvest information about datasets, and what requirements exist on repositories to support this process.

The OAI-PMH protocol allows a repository to provide different descriptions of repository items. A repository may support different records for the same item, where a record is the known information about some item encoded according to some specific metadata schema. When describing the different formats, there are two key pieces of information: the metadataPrefix and the metadataNamespace. The metadataPrefix provides a simple name for the format: an ID that the OAI-PMH client uses, when making OAI-PMH requests, to specify in which metadata format it expects records. The metadataNamespace describes the nature of the metadata: what the different fields mean and how they are organised (semantics and syntax).

Currently, the Metadata Format section says only that OpenAIRE is expected DataCite records to be available with the metadataPrefix oai_datacite.

Unfortunately, this leaves ambiguous what exactly OpenAIRE is expecting to receive. Which format is oai_datacite?

There are two dimensions of this ambiguity.

The first dimension of ambiguity is on framing. Broadly speaking, there are two ways in which DataCite may be encoded: "wrapped" or "non-wrapped".

The "wrapped" option uses a metadataNamespace of http://schema.datacite.org/oai/oai-1.1/. With this format, some ancillary (not from DataCite metadata) information is included. The DataCite metadata record placed under a payload element.

With the "non-wrapped" option, the metadataNamespace is the corresponding DataCite record (e.g., http://datacite.org/schema/kernel-4). There is no ancillary information and the DataCite metadata record is available directly under the OAI-PMH metadata element.

The second dimension of ambiguity is on which version (or versions) of the DataCite metadata schema is supported by OpenAIRE. At the current time, the most recent DataCite metadata schema version is v4.6, but other schema versions (v4.0 ... v4.5, v3.0 and v3.1) are still being used.

I'd recommend the Metadata Format section is updated with two concrete additional pieces of information:

  1. The value (or values) of metadataNamespace that OpenAIRE supports when harvesting records.
  2. Which versions of DataCite metadata schema are supported.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions