Skip to content

Generator-level nondeterminism leaks past RDFC-1.0 canonicalization #3516

@amc-corey-cox

Description

@amc-corey-cox

While verifying #3407 I found three sources of process-level nondeterminism that survive canonicalize_rdf_graph because they're upstream of it. Not blockers for #3407 — pre-existing on main, just newly visible.

  1. SHACL_build_ignored_properties (shaclgen.py:476) emits a Python set() as an rdf:first/rdf:rest list, so element order varies with PYTHONHASHSEED. The graphs are not RDF-isomorphic across runs, so RDFC-1.0 can't normalize them. Fix is one-line: sort before emitting.

  2. RDFGen — injects linkml:generation_date timestamp; JSON-LD parse step also picks up dict-ordering noise.

  3. Metamodel OWLbibo:status testing annotations serialize to scheme-less <testing> IRIs. pyoxigraph rejects → canonicalize_rdf_graph falls back to rdflib (non-deterministic). 32 such triples in the metamodel.

Each reproduces by running the relevant generator 3× in fresh processes and diffing.

Metadata

Metadata

Assignees

Labels

bugSomething that should work but isn't, with an example and a test case.generator-miscPertaining to more than one generator, or perhaps one that doesn't exist yet

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions