RFC: JSON-LD / Schema.org mapping

As part of my work on STAC Browser, I've just [merged](https://github.com/radiantearth/stac-browser/pull/25) preliminary JSON-LD support intended to facilitate indexing, searching, and display by [Google Dataset Search](https://toolbox.google.com/datasetsearch).

I've tried to follow their [guidelines](https://developers.google.com/search/docs/data-types/dataset), mapping `Catalog`s and `Collection`s to schema.org `DataCatalog`s and `Item`s to `Datasets`.

## Catalog / Collection → DataCatalog

```javascript
{
  "@context": "https://schema.org/",
  "@type": "DataCatalog",

  // required
  name: catalog.title,
  description: catalog.description, // as HTML

  // recommended
  identifier: catalog.properties["sci:doi"] || catalog.id,
  citation: catalog.properties["sci:citation"], // if available
  keywords: catalog.keywords,
  isBasedOn: catalog.url, // canonical STAC catalog URL (JSON)
  version: catalog.version,
  url: <STAC Browser URL>,
  // if available
  workExample: this.properties["sci:publications"].map(p => ({
     identifier: p.doi,
     citation: p.citation
   })),

  // if license is "proprietary"
  license: catalog.links.find(x => x.rel === "license").href,
  // if license is SPDX-compatible
  license: `https://spdx.org/licenses/${catalog.license}.html`,

  // if a spatial extent is available
  spatialCoverage = {
    "@type": "Place",
    geo: {
      "@type": "GeoShape",
      box: catalog.extent.spatial.join(" ")
    }
  },

  // if a temporal extent is available
  temporalCoverage: catalog.extent.temporal.map(x => x || "..").join("/"),

  // if a parent catalog is defined:
  isPartOf: {
    "@type": "DataCatalog",
    name: parent.title || parent.id, // if available
    isBasedOn: parent.url,
    url: <STAC Browser URL>
  },

  // for each child catalog:
  hasPart: {
    "@type": "DataCatalog",
    name: child.title,
    isBasedOn: child.url,
    url: <STAC Browser URL>
  },

  // for each referenced item:
  dataset: {
    identifier: item.id, // if available; requires loading the Item
    name: item.properties.title || item.id, // if available; requires loading the Item
    isBasedOn: item.url,
    url: <STAC Browser URL>
  }
}
```

`providers` are mapped according to `roles` (when multiple roles are specified, the provider is duplicated):

* `licensor` → `copyrightHolder`
* `producer` → `producer`
* `processor` → `contributor`
* `host` → `provider`

and rendered as:

```javascript
{
  // ...
  [mapped role]: {
    description: provider.description, // if available
    name: provider.name,
     url: provider.url // if available
  }
}
```

## Item → Dataset

```javascript
{
  "@context": "https://schema.org/",
  "@type": "Dataset",

  // required
  name: item.properties.title || item.id,
  description: this.properties.description, // if available

  // recommended
  identifier: item.properties["sci:doi"] || item.id,
  citation: catalog.properties["sci:citation"], // if available
  keywords: collection.keywords || rootCatalog.keywords, // inherit collection / root catalog keywords, if available
  // if license is "proprietary"
  license: [item.links, collection.links, rootCatalog.links].find(x => x.rel === "license").href,
  // if license is SPDX-compatible
  license: `https://spdx.org/licenses/${item.properties["item:license"] || collection.license || rootCatalog.license}.html`,
  isBasedOn: item.url, // canonical STAC item URL (JSON)
  url: <STAC Browser URL>,
  // if available
  workExample: this.properties["sci:publications"].map(p => ({
     identifier: p.doi,
     citation: p.citation
   })),
  image: item.assets.thumbnail,

  // for associated collections + parent catalogs
  includedInDataCatalog: {
    isBasedOn: c.href,
    url: <STAC Browser URL>
  },

  spatialCoverage: {
    "@type": "Place",
    geo: {
      "@type": "GeoShape",
      box: item.bbox.join(" ")
    }
  },

  temporalCoverage: this.properties["dtr:start_datetime"]
    ? [
        this.properties["dtr:start_datetime"],
        this.properties["dtr:end_datetime"]
      ]
        .map(x => x || "..")
        .join("/")
    : item.properties.datetime,

  // for each asset in item.assets
  distribution: {
    contentUrl: asset.href,
    fileFormat: asset.type,
    name: asset.title
  }
};
```

This implementation is live (with pre-rendered HTML) at https://planet.stac.cloud. Hopefully in the coming days it will be better indexed by Google (I've submitted the sitemap), including by Dataset Search, at which point we can see how well this mapping does at being rendered.

Meanwhile, the [OpenLink Structured Data Sniffer](https://chrome.google.com/webstore/detail/openlink-structured-data/egdaiaihbdoiibopledjahjaihbmjhdj/related?hl=en) extension for Chrome will extract JSON-LD to allow inspection.

Thoughts?

Refs #285 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: JSON-LD / Schema.org mapping #378

Catalog / Collection → DataCatalog

Item → Dataset

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RFC: JSON-LD / Schema.org mapping #378

Description

Catalog / Collection → DataCatalog

Item → Dataset

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions