Description
As part of my work on STAC Browser, I've just merged preliminary JSON-LD support intended to facilitate indexing, searching, and display by Google Dataset Search.
I've tried to follow their guidelines, mapping Catalog
s and Collection
s to schema.org DataCatalog
s and Item
s to Datasets
.
Catalog / Collection → DataCatalog
{
"@context": "https://schema.org/",
"@type": "DataCatalog",
// required
name: catalog.title,
description: catalog.description, // as HTML
// recommended
identifier: catalog.properties["sci:doi"] || catalog.id,
citation: catalog.properties["sci:citation"], // if available
keywords: catalog.keywords,
isBasedOn: catalog.url, // canonical STAC catalog URL (JSON)
version: catalog.version,
url: <STAC Browser URL>,
// if available
workExample: this.properties["sci:publications"].map(p => ({
identifier: p.doi,
citation: p.citation
})),
// if license is "proprietary"
license: catalog.links.find(x => x.rel === "license").href,
// if license is SPDX-compatible
license: `https://spdx.org/licenses/${catalog.license}.html`,
// if a spatial extent is available
spatialCoverage = {
"@type": "Place",
geo: {
"@type": "GeoShape",
box: catalog.extent.spatial.join(" ")
}
},
// if a temporal extent is available
temporalCoverage: catalog.extent.temporal.map(x => x || "..").join("/"),
// if a parent catalog is defined:
isPartOf: {
"@type": "DataCatalog",
name: parent.title || parent.id, // if available
isBasedOn: parent.url,
url: <STAC Browser URL>
},
// for each child catalog:
hasPart: {
"@type": "DataCatalog",
name: child.title,
isBasedOn: child.url,
url: <STAC Browser URL>
},
// for each referenced item:
dataset: {
identifier: item.id, // if available; requires loading the Item
name: item.properties.title || item.id, // if available; requires loading the Item
isBasedOn: item.url,
url: <STAC Browser URL>
}
}
providers
are mapped according to roles
(when multiple roles are specified, the provider is duplicated):
licensor
→copyrightHolder
producer
→producer
processor
→contributor
host
→provider
and rendered as:
{
// ...
[mapped role]: {
description: provider.description, // if available
name: provider.name,
url: provider.url // if available
}
}
Item → Dataset
{
"@context": "https://schema.org/",
"@type": "Dataset",
// required
name: item.properties.title || item.id,
description: this.properties.description, // if available
// recommended
identifier: item.properties["sci:doi"] || item.id,
citation: catalog.properties["sci:citation"], // if available
keywords: collection.keywords || rootCatalog.keywords, // inherit collection / root catalog keywords, if available
// if license is "proprietary"
license: [item.links, collection.links, rootCatalog.links].find(x => x.rel === "license").href,
// if license is SPDX-compatible
license: `https://spdx.org/licenses/${item.properties["item:license"] || collection.license || rootCatalog.license}.html`,
isBasedOn: item.url, // canonical STAC item URL (JSON)
url: <STAC Browser URL>,
// if available
workExample: this.properties["sci:publications"].map(p => ({
identifier: p.doi,
citation: p.citation
})),
image: item.assets.thumbnail,
// for associated collections + parent catalogs
includedInDataCatalog: {
isBasedOn: c.href,
url: <STAC Browser URL>
},
spatialCoverage: {
"@type": "Place",
geo: {
"@type": "GeoShape",
box: item.bbox.join(" ")
}
},
temporalCoverage: this.properties["dtr:start_datetime"]
? [
this.properties["dtr:start_datetime"],
this.properties["dtr:end_datetime"]
]
.map(x => x || "..")
.join("/")
: item.properties.datetime,
// for each asset in item.assets
distribution: {
contentUrl: asset.href,
fileFormat: asset.type,
name: asset.title
}
};
This implementation is live (with pre-rendered HTML) at https://planet.stac.cloud. Hopefully in the coming days it will be better indexed by Google (I've submitted the sitemap), including by Dataset Search, at which point we can see how well this mapping does at being rendered.
Meanwhile, the OpenLink Structured Data Sniffer extension for Chrome will extract JSON-LD to allow inspection.
Thoughts?
Refs #285
Activity