Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of standard RDF canonicalization #26

Open
dbooth-boston opened this issue Dec 7, 2018 · 2 comments
Open

Lack of standard RDF canonicalization #26

dbooth-boston opened this issue Dec 7, 2018 · 2 comments
Labels
Category: related standards For RDF-related standards standards Standardization should address this

Comments

@dbooth-boston
Copy link
Collaborator

Canonicalization
is the ability to represent RDF in a consistent, predictable
serialization. It is essential for diff and digital signatures.
Developers expect to be able to diff two files, and source
control systems rely on being able to do so. It is easy with
most other data representations. Why not RDF? Answer: Blank
nodes. Unrestricted blank nodes cause RDF canonicalization
to be a "hard problem", equivalent in complexity to the graph
isomorphism problem.[6]

IDEA: JSON-LD canonicalization

Some recent good progress on canonicalization: JSON-LD
https://json-ld.github.io/normalization/spec/ . However, the
current JSON-LD canonicalization draft (called "normalization")
is focused only on the digital signatures use case, and
needs improvement to better address the diff use case, in
which small, localized graph changes should result in small,
localized differences in the canonicalized graph.

More discussion and analysis of canonicalization:
w3c/strategy#116
w3c/strategy#116 (comment)
w3c/strategy#116 (comment)
w3c/strategy#116 (comment)
w3c/strategy#116 (comment)

IDEA: RDF canonicalization

http://aidanhogan.com/docs/skolems_blank_nodes_www.pdf http://aidanhogan.com/docs/rdf-canonicalisation.pdf
https://github.com/iherman/canonical_rdf
https://lists.w3.org/Archives/Public/www-archive/2018Oct/0011.html

@dbooth-boston dbooth-boston added the Category: related standards For RDF-related standards label Dec 8, 2018
@dbooth-boston dbooth-boston added standards Standardization should address this and removed standards Standardization should address this labels Mar 10, 2019
@chiarcos
Copy link

chiarcos commented Mar 4, 2021

This involves two problems: Blank nodes and order. If the first can be solved, what works for diff in practice is to transform your data to nt, sort it, serialize it to turtle (just for readability), and then diff over the turtle versions. This is not necessarily convenient and maybe not intuitive, and there may be better tooling, but this doesn't need the languages involved to change. There may be more clever ways, in particular ways that also support streams and not just data dumps, but having with this in mind, this issue can IMHO be closed and further discussed under issue #19. The signature aspect is addressed by JSON-LD canonicalization.

@dbooth-boston
Copy link
Collaborator Author

@chiarcos , I agree that if the blank node problem is solved then RDF canonicalization will be easy. But I think it is worth keeping this issue open for two reasons: 1. someone noting the lack of RDF canonicalization will not necessarily know to look at the blank node issue; and 2. even when the blank node issue is solved, a canonicalization standard still needs to be defined.

Also, note that although JSON-LD canonicalization is an an excellent step in the right direction, the original algorithm did not address the diff use case, in which a small change to the source graph is likely to yield only a small change in the resulting canonicalization. Discussion about canonical RDF suggests that changes to the algorithm were being considered, but as of this writing I do not know whether the proposed algorithm has been upgraded to address the diff use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: related standards For RDF-related standards standards Standardization should address this
Projects
None yet
Development

No branches or pull requests

2 participants