Skip to content

Local and endpoint migration for renamed and removed entities. #554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Sep 14, 2021
51 changes: 51 additions & 0 deletions docs/MajorVersionMigration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Major Version Migration

When non-backward compatible changes are introduced into `gist` during major upgrades,
the release package will include queries to facilitate the migration of existing ontologies
and data to conform to the new version of `gist`.

For changes that are not amenable to automatic migration, queries will be provided that
report the use of deleted or modified classes and properties so that mitigation measures
can be initiated.

Every major version of `gist` (starting with 10.0.0) will add a sub-directory under the
`migration` directory containing the queries and two migration scripts:
1. `migrate_local.yaml`, intended to apply changes to locally stored RDF data, and
2. `migrate_endpoint.yaml`, which is used to modify RDF data in a triple store

The migration scripts rely on the [Ontology Toolkit](https://pypi.org/project/onto-tool/), which is a
Python-based open source tool provided by Semantic Arts. It requires Python version 3.8 or greater
to be installed.

## Migrating RDF Data in Local Files

Once you have the Ontology Toolkit installed, issue the following command from the directory where
you cloned [gist](https://github.com/semanticarts/gist):
```shell
onto_tool bundle -v input INPUT-DIR
-v output OUTPUT-DIR
-v report REPORT-DIR migration/v10.0/migrate_local.yaml
```
where _INPUT-DIR_ is the directory in which your RDF data is located, _OUTPUT-DIR_ is the directory where
updated RDF should be written, and REPORT-DIR is a directory where reports regarding any issues found
during migration are stored. The tool will also list the issues during execution. Output and report
directories will be created as needed, but any existing files in them will be overwritten.

Note that only `.ttl` (Turtle) and `.owl` (RDF/XML) files in the _INPUT-DIR_ are transformed - no
subdirectories are traversed, and those would require additional tool invocations.

## Migrating Data in a Triple Store

Once you have the Ontology Toolkit installed, issue the following command from the directory where
you cloned [gist](https://github.com/semanticarts/gist):
```shell
onto_tool bundle -v user USER -v password PWD
-v endpoint ENDPOINT-URI
[ -v update_endpoint UPDATE-URI ]
-v report REPORT-DIR migration/v10.0/migrate_endpoint.yaml
```
where _ENDPOINT-URI_ is the address of your SPARQL endpoint, and _USER_ and _PWD_ are the credentials
required to access it. Only Basic HTTP authentication is handled at this time. If your triple store has
a separate endpoint for UPDATE queries (e.g. Stardog), provide it as _UPDATE-URI_. Reports regarding any
issues found during migration are stored in _REPORT-DIR_. Report directory will be created as needed,
but any existing files in it will be overwritten.
6 changes: 6 additions & 0 deletions docs/ReleaseNotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ gist Release Notes
Release 10.0.0
-----

This is a major release, which includes several changes which break compatibility with previous versions
of `gist`. See the [migration guide](./MajorVersionMigration.md) for documentation on updating existing
`gist`-based ontologies and instance data.

### Major Updates

- Renamed 62 object and datatype properties to reflect newly-established conventions. Includes corresponding updates to the [gist style guide](https://github.com/semanticarts/gist/blob/master/docs/gistStyleGuide.md). Issues [188](https://github.com/semanticarts/gist/issues/188), [507](https://github.com/semanticarts/gist/issues/507).
- Renamed `MimeType` to `MediaType` to be consistent with [IANA guidelines](https://www.iana.org/assignments/media-types/media-types.xhtml).
and [RFC6838](https://tools.ietf.org/html/rfc6838). Issue [#434](<https://github.com/semanticarts/gist/issues/434>).
Expand Down
29 changes: 29 additions & 0 deletions migration/v10.0/action_rename_classes.rq
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
prefix gist: <https://ontologies.semanticarts.com/gist/>

DELETE {
graph ?graph {
?ref ?refProp ?oldClass .
?oldClass ?defProp ?defObj .
}
}
INSERT {
graph ?graph {
?ref ?refProp ?newClass .
?newClass ?defProp ?defObj .
}
}
where {
values (?oldClass ?newClass) {
# Issue #434
(gist:MimeType gist:MediaType)
# Issue #483
(gist:BuildingAddress gist:StreetAddress)
}
graph ?graph {
{
?ref ?refProp ?oldClass
} UNION {
?oldClass ?defProp ?defObj
}
}
}
23 changes: 23 additions & 0 deletions migration/v10.0/action_rename_classes_default.rq
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
prefix gist: <https://ontologies.semanticarts.com/gist/>

DELETE {
?ref ?refProp ?oldClass .
?oldClass ?defProp ?defObj .
}
INSERT {
?ref ?refProp ?newClass .
?newClass ?defProp ?defObj .
}
where {
values (?oldClass ?newClass) {
# Issue #434
(gist:MimeType gist:MediaType)
# Issue #483
(gist:BuildingAddress gist:StreetAddress)
}
{
?ref ?refProp ?oldClass
} UNION {
?oldClass ?defProp ?defObj
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have both action_rename_classes.rq and action_rename_classes_default.rq and similarly for properties?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do renaming in two steps:

  1. First rename everything that is in a named graph, retaining the modified triples in their original graphs, and then
  2. Rename all that is remaining, which would be in the default graph
    We do it in this manner because the default graph does not have a universally recognized reference URI, and in some triple stores it doesn't have a URI at all. Also, when processing local file data, there are no named graphs, so only the default queries (which don't specify a graph) are used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A local .trig or N-quads file can have named graphs. I assume that works correctly for them as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jamie-SA There is no current support in onto-tool for .trig or .nq inputs. Do you view this as a deal breaker for this functionality? There is no way I can code and test this before Monday

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a deal breaker for me, I had assumed the library you use to read/write would support all of the main formats. But if not, I wouldn't hold this up because of it.

I have switched almost entirely to .trig because of it's support for named graphs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The toolkit (rdflib) does support it, but I have not enabled it - and there is an existing issue (semanticarts/ontology-toolkit#65), to which I will give some attention after this release.

102 changes: 102 additions & 0 deletions migration/v10.0/action_rename_properties.rq
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
prefix gist: <https://ontologies.semanticarts.com/gist/>

DELETE {
graph ?graph {
?s ?oldProp ?o .
?ref ?refProp ?oldProp .
?oldProp ?defProp ?defObj .
}
}
INSERT {
graph ?graph {
?s ?newProp ?o .
?ref ?refProp ?newProp .
?newProp ?defProp ?defObj .
}
}
where {
values (?oldProp ?newProp) {
# Issue #171
(gist:decimalValue gist:numericValue)
# Issue #126
(gist:networkConnection gist:links)
(gist:hasFromNode gist:linksFrom)
(gist:hasToNode gist:linksTo)
# Issue #483
(gist:hasStreetAddress gist:hasAddress)
# Issue 188
(gist:directlyPrecedes gist:precedesDirectly)
(gist:hasUoM gist:hasUnitOfMeasure)
(gist:multiplicand gist:hasMultiplicand)
(gist:multiplier gist:hasMultiplier)
(gist:affectedBy gist:isAffectedBy)
(gist:allocatedBy gist:isAllocatedBy)
(gist:denominator gist:hasDenominator)
(gist:numerator gist:hasNumerator)
(gist:categorizedBy gist:isCategorizedBy)
(gist:actualStart gist:hasActualStart)
(gist:actualEnd gist:hasActualEnd)
(gist:expressedIn gist:isExpressedIn)
(gist:plannedStart gist:hasPlannedStart)
(gist:plannedEnd gist:hasPlannedEnd)
(gist:triggeredBy gist:isTriggeredBy)
(gist:hasJurisdiction gist:hasJurisdictionOver)
(gist:governedBy gist:isGovernedBy)
(gist:recognizedBy gist:isRecognizedBy)
(gist:characterizedAs gist:isCharacterizedAs)
(gist:start gist:hasStart)
(gist:end gist:hasEnd)
(gist:fromPlace gist:comesFromPlace)
(gist:toPlace gist:goesToPlace)
(gist:geoContains gist:containsGeographically)
(gist:timeZoneStandardUsed gist:usesTimeZoneStandard)
(gist:identifiedBy gist:isIdentifiedBy)
(gist:permanentGeoOccupies gist:occupiesGeographicallyPermanently)
(gist:offspringOf gist:hasBiologicalParent)
(gist:fromAgent gist:comesFromAgent)
(gist:toAgent gist:goesToAgent)
(gist:memberOf gist:isMemberOf)
(gist:hasGetter gist:hasRecipient)
(gist:directlyPrecededBy gist:followsDirectly)
(gist:offspringOf gist:hasBiologicalParent)
(gist:occursAt gist:occursIn)
(gist:madeUpOf gist:isMadeUpOf)
(gist:convertToBase gist:baseConversionFactor)
(gist:renderedOn gist:isRenderedOn)
(gist:basisFor gist:isBasisFor)
(gist:hasTag gist:tagText)
(gist:connectedTo gist:isConnectedTo)
(gist:about gist:isAbout)
(gist:actual gist:hasActual)
(gist:aspectOf gist:isAspectOf)
(gist:basedOn gist:isBasedOn)
(gist:convertToStandard gist:standardConversionFactor)
(gist:describedIn gist:isDescribedIn)
(gist:directPartOf gist:isDirectPartOf)
(gist:directSubTaskOf gist:isDirectSubtaskOf)
(gist:directlyRecognizedBy gist:isRecognizedDirectlyBy)
(gist:geoContainedIn gist:isGeographicallyContainedIn)
(gist:geoOccupiedBy gist:isGeographicallyOccupiedBy)
(gist:parentOf gist:hasBiologicalOffspring)
(gist:partOf gist:isPartOf)
(gist:planned gist:hasPlanned)
(gist:subTaskOf gist:isSubTaskOf)
(gist:viableRange gist:hasViableRange)
(gist:permanentGeoOccupiedBy gist:isGeographicallyPermanentlyOccupiedBy)
(gist:recordedOn gist:isRecordedAt)
(gist:sameTimeAs gist:isSameTimeAs)
(gist:geoOccupies gist:occupiesGeographically)
(gist:unitSymbolHTML gist:unitSymbolHtml)
(gist:lastModifiedOn gist:wasLastModifiedAt)
(gist:offsetToUniversal gist:hasOffsetToUniversal)
}
graph ?graph {
{
?s ?oldProp ?o
} UNION {
?ref ?refProp ?oldProp
} UNION {
?oldProp ?defProp ?defObj
}
}
}
96 changes: 96 additions & 0 deletions migration/v10.0/action_rename_properties_default.rq
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
prefix gist: <https://ontologies.semanticarts.com/gist/>

DELETE {
?s ?oldProp ?o .
?ref ?refProp ?oldProp .
?oldProp ?defProp ?defObj .
}
INSERT {
?s ?newProp ?o .
?ref ?refProp ?newProp .
?newProp ?defProp ?defObj .
}
where {
values (?oldProp ?newProp) {
# Issue #171
(gist:decimalValue gist:numericValue)
# Issue #126
(gist:networkConnection gist:links)
(gist:hasFromNode gist:linksFrom)
(gist:hasToNode gist:linksTo)
# Issue #483
(gist:hasStreetAddress gist:hasAddress)
# Issue 188
(gist:directlyPrecedes gist:precedesDirectly)
(gist:hasUoM gist:hasUnitOfMeasure)
(gist:multiplicand gist:hasMultiplicand)
(gist:multiplier gist:hasMultiplier)
(gist:affectedBy gist:isAffectedBy)
(gist:allocatedBy gist:isAllocatedBy)
(gist:denominator gist:hasDenominator)
(gist:numerator gist:hasNumerator)
(gist:categorizedBy gist:isCategorizedBy)
(gist:actualStart gist:hasActualStart)
(gist:actualEnd gist:hasActualEnd)
(gist:expressedIn gist:isExpressedIn)
(gist:plannedStart gist:hasPlannedStart)
(gist:plannedEnd gist:hasPlannedEnd)
(gist:triggeredBy gist:isTriggeredBy)
(gist:hasJurisdiction gist:hasJurisdictionOver)
(gist:governedBy gist:isGovernedBy)
(gist:recognizedBy gist:isRecognizedBy)
(gist:characterizedAs gist:isCharacterizedAs)
(gist:start gist:hasStart)
(gist:end gist:hasEnd)
(gist:fromPlace gist:comesFromPlace)
(gist:toPlace gist:goesToPlace)
(gist:geoContains gist:containsGeographically)
(gist:timeZoneStandardUsed gist:usesTimeZoneStandard)
(gist:identifiedBy gist:isIdentifiedBy)
(gist:permanentGeoOccupies gist:occupiesGeographicallyPermanently)
(gist:offspringOf gist:hasBiologicalParent)
(gist:fromAgent gist:comesFromAgent)
(gist:toAgent gist:goesToAgent)
(gist:memberOf gist:isMemberOf)
(gist:hasGetter gist:hasRecipient)
(gist:directlyPrecededBy gist:followsDirectly)
(gist:offspringOf gist:hasBiologicalParent)
(gist:occursAt gist:occursIn)
(gist:madeUpOf gist:isMadeUpOf)
(gist:convertToBase gist:baseConversionFactor)
(gist:renderedOn gist:isRenderedOn)
(gist:basisFor gist:isBasisFor)
(gist:hasTag gist:tagText)
(gist:connectedTo gist:isConnectedTo)
(gist:about gist:isAbout)
(gist:actual gist:hasActual)
(gist:aspectOf gist:isAspectOf)
(gist:basedOn gist:isBasedOn)
(gist:convertToStandard gist:standardConversionFactor)
(gist:describedIn gist:isDescribedIn)
(gist:directPartOf gist:isDirectPartOf)
(gist:directSubTaskOf gist:isDirectSubtaskOf)
(gist:directlyRecognizedBy gist:isRecognizedDirectlyBy)
(gist:geoContainedIn gist:isGeographicallyContainedIn)
(gist:geoOccupiedBy gist:isGeographicallyOccupiedBy)
(gist:parentOf gist:hasBiologicalOffspring)
(gist:partOf gist:isPartOf)
(gist:planned gist:hasPlanned)
(gist:subTaskOf gist:isSubTaskOf)
(gist:viableRange gist:hasViableRange)
(gist:permanentGeoOccupiedBy gist:isGeographicallyPermanentlyOccupiedBy)
(gist:recordedOn gist:isRecordedAt)
(gist:sameTimeAs gist:isSameTimeAs)
(gist:geoOccupies gist:occupiesGeographically)
(gist:unitSymbolHTML gist:unitSymbolHtml)
(gist:lastModifiedOn gist:wasLastModifiedAt)
(gist:offsetToUniversal gist:hasOffsetToUniversal)
}
{
?s ?oldProp ?o
} UNION {
?ref ?refProp ?oldProp
} UNION {
?oldProp ?defProp ?defObj
}
}
28 changes: 28 additions & 0 deletions migration/v10.0/detect_new_domain_restrictions.rq
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
prefix skos: <http://www.w3.org/2004/02/skos/core#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix sh: <http://www.w3.org/ns/shacl#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
prefix gist: <https://ontologies.semanticarts.com/gist/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

# validate domain (Person U Organization) on hasCommunicationAddress
CONSTRUCT {
?report a sh:ValidationReport ;
sh:conforms false ;
sh:result
[
a sh:ValidationResult ;
sh:focusNode ?failedEntity ;
sh:resultMessage "Domain restriction on gist:hasCommunicationAddress violated.";
sh:resultSeverity sh:Warning ;
sh:sourceConstraintComponent <urn:constraint:hasCommunicationAddress-domain>
] .
}
WHERE {
graph ?g1 { ?failedEntity gist:hasCommunicationAddress ?addr . }
FILTER NOT EXISTS {
{ ?failedEntity a gist:Person } UNION { ?failedEntity a gist:Organization }
}

bind(<urn:new-domain-validation-report> as ?report)
}
28 changes: 28 additions & 0 deletions migration/v10.0/detect_new_domain_restrictions_default.rq
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
prefix skos: <http://www.w3.org/2004/02/skos/core#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix sh: <http://www.w3.org/ns/shacl#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
prefix gist: <https://ontologies.semanticarts.com/gist/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

# validate domain (Person U Organization) on hasCommunicationAddress
CONSTRUCT {
?report a sh:ValidationReport ;
sh:conforms false ;
sh:result
[
a sh:ValidationResult ;
sh:focusNode ?failedEntity ;
sh:resultMessage "Domain restriction on gist:hasCommunicationAddress violated." ;
sh:resultSeverity sh:Warning ;
sh:sourceConstraintComponent <urn:constraint:hasCommunicationAddress-domain>
] .
}
WHERE {
{ ?failedEntity gist:hasCommunicationAddress ?addr . }
FILTER NOT EXISTS {
{ ?failedEntity a gist:Person } UNION { ?failedEntity a gist:Organization }
}

bind(<urn:new-domain-validation-report> as ?report)
}
Loading