Description
I am trying to use the library to create a Detatched RO-Crate. For a minimal example, I tried to create something that contains no files; I assume a minimal jsonld to look similar to this (obviously not valid since it has no license etc, but serves as an example):
{
"@context": "https://w3id.org/ro/crate/1.1/context",
"@graph": [
{
"@id": "https://example.org/ro-crate-metadata.json",
"@type": "CreativeWork",
"about": { "@id": "https://example.com/" },
"conformsTo": { "@id": "https://w3id.org/ro/crate/1.1" }
},
{
"@id": "https://example.com/",
"@type": "Dataset",
(...)
}
]
}
When the crate object is created from scratch with ROCrate()
, the id
of the metadata and root dataset entity are already set to default values ro-crate-metadata.json
and ./
. Because the ids are not mutable, the only way to set appropriate ids that i found was to recreate the entities:
from rocrate.rocrate import ROCrate
from rocrate.model import RootDataset, Metadata
crate = ROCrate()
crate.add(RootDataset(crate, "https://example.com/"))
crate.add(Metadata(crate, "https://example.com/ro-crate-metadata.json"))
However, I noticed some problems with this approach:
Issue 1
The medata entity about
property does not get updated to the new root dataset entity. this can be fixed by adding an about
property when creating the Metadata entity, but I think a better approach would be to have the library handle this internally for the two must-have entities RootDataset and Metadata.
Issue 2
The original RootDataset and Metadata entity are not actually replaced in the ROCrate. Instead, the new entities are added on top of them. The reason is that the old and new entities do not resolve to the same hash value, leading to the old entities not being evicted from the internal map. The resulting jsonld therefore looks like this even after setting the about
manually:
{
"@context": "https://w3id.org/ro/crate/1.1/context",
"@graph": [
{
"@id": "./",
"@type": "Dataset",
"datePublished": "2024-11-29T14:39:49+00:00"
},
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"about": {"@id": "./"},
"conformsTo": {"@id": "[https://w3id.org/ro/crate/1.1](https://w3id.org/ro/crate/1.1)"}
},
{
"@id": "https://example.com/",
"@type": "Dataset",
"datePublished": "2024-11-29T14:39:49+00:00"
},
{
"@id": "https://example.com/ro-crate-metadata.json",
"@type": "CreativeWork",
"about": {"@id": "https://example.com/"},
"conformsTo": {"@id": "[https://w3id.org/ro/crate/1.1](https://w3id.org/ro/crate/1.1)"}
}
]}
I consider this as a bug since there is actual code inside the add
method to work with RootDataset and Metadata, but it fails to override existing entities as intended.
Unless I am approaching detached RO-Crates totally wrong, the only way I found to build a valid detached RO-Crate was to delete the remaining additional entities manually. This led to this "hacky" approach:
crate = ROCrate()
# get a reference to the old root and metadata entity
entities_to_delete = [crate.root_dataset, crate.metadata]
# replace root and metadata entity with entities that have the correct identifier
crate.add(RootDataset(crate, "https://example.com/"))
crate.add(Metadata(crate, "https://example.com/ro-crate-metadata.json", properties={"about": crate.root_dataset}))
# delete old entities
crate.delete(*entities_to_delete)
# generate json for detached crate
pprint(crate.metadata.generate())
Fixing the two issues would make it way easier to work with Detached RO-Crates in general.
On a side note: It would be nice to have a way to instantiate the RO-Crate with correct ids directly, but I think its fine the way it is if some documentation is added about how to build a detached crate, so not everyone has to figure all of the above out on his/her own.