-
Notifications
You must be signed in to change notification settings - Fork 581
dataset re-work for 7.0 release #1814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
black the other new tests
Comments on changes:WORK-IN-PROGRESS: Re-working of "identifier-as-context" PR and removing ConjunctiveGraph, re-positioned as a contribution to release 7.0 as the removal of ConjunctiveGraph comprehensively breaks the public API. I've been working on this more complete implementation since January, hence the large number of changes. Actually there's relatively little impact on core library files, what's different from the last rebased PR is a few changed lines here and there in the parsers and serializers to get a coherent implementation. The bulk of the changes are additional tests and test fixture data.
Isomorphism is an unresolved issue, as are operators. The W3 declined to publish a formal semantics for Datasets and the pertinent issue was resolved as "won't fix"
I've retained my attempted implementation of operators so that reviewers can more easily get an idea of the representational issues without having to code them up but atm I'm inclined to believe that RDFLib should not support operators for Datasets (because it seems semantically vacuous to me) and that the implementation of Dataset operators can be removed. Isomorphic functionality is currently sematically suspect, having been casually extended to use Datasets. There are only a couple of feasible changes that can be made without wrecking the implementation of the RGDA1 graph digest algorithm and I'm currently thrashing through the issues attendant on QuotedGraphs, formulae and the consequent implications for Dataset conicalization and isomorphism. One issue presenting difficulty is that |
|
@gjhiggins not sure exactly what your plan is to merge this, but I would be very happy to slice a bunch of things off from this into seperate PRs - for example most of the changes to change tests from ConjunctiveGraph to DataSet would possibly make sense independently of this PR. |
| if isbnode(o): | ||
| o = _blank | ||
| gcopy.add((s, p, o)) | ||
| gcopy.add((s, p, o, DATASET_DEFAULT_GRAPH_ID if c is None else c)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is still in draft, just thinking, would it not be better to associate None with the default Graph ID? That would make the interface a bit easier to use I think.
|
@gjhiggins happy to do rebases/merges on this, just let me know either here or on gitter and I will rebase with priority. Also happy to fix any other CI problems with this. |
Uneasy about the migration to DatasetThe more I think about it, the more uncomfortable I am about this move as it necessitates imposing some arbitrary decisions about the (undefined) semantics of named graphs. Just last night @aucampia and I had to resolve a mismatch in expectations about the syntactic-vs-semantic content of named graphs in the context of checking the accuracy of round-tripping. For a quad-aware store (i.e. all of the current RDFLib stores) the
In practice, it can result in test failures, e.g. the mismatch between a triple parsed in the context of a document base: (
rdflib.term.URIRef('http://example.org/subject'),
rdflib.term.URIRef('http://example.org/predicate'),
rdflib.term.URIRef('http://example.org/object'),
rdflib.term.URIRef('example:variants/simple_triple')
)and one created as a result of parsing the same triple but without a document location: (
rdflib.term.URIRef('http://example.org/subject'),
rdflib.term.URIRef('http://example.org/predicate'),
rdflib.term.URIRef('http://example.org/object'),
None
)Given two Datasets with quad content as above, resolving the question of whether the two Datasets are isomorphic is arbitrary - either the names of graphs are treated as i) semantic tokens, in which case the datasets are not considered to be isomorphic or ii) merely syntactic tokens in which case the datasets can be considered to be isomorphic. Currently, graph identifiers are ignored by The W3 declined to publish a formal semantics for Datasets and whilst there was discussion of the implications, the issue of merging datasets was closed as "can't fix" (my interpretation):
If one can't provide semantically-sound support for UNION, then that must extend to all of the other set-theoretic operators. My concern here is that if Datasets can't be merged without external knowledge then it would seem that computing isomorphism must be similarly constrained - or am I missing something? |
Summary of changes
Complete replacement of ConjunctiveGraph by Dataset , imposition of identifier-as-context throughout. Commits split into tolerable subsections, core changes confined to 5837d01
Checklist
test_variants.py)./examplesfor new features. (in progress)CHANGELOG.md). (TBD)so maintainers can fix minor issues and keep your PR up to date.
Three (likely, trivial) test failures with
test_variant.py