Skip to content

Efficient binary RDF format #18

Closed
Closed
@aaronc

Description

In order to store on-chain RDF data, it makes most sense to have an efficient binary format for storing that data that relates to the schema module (which defines global schemas for RDF data). This format will:

  • enable efficient verification that the data conforms to the global RDF schema
  • enable efficient verification of the graph hash
  • save storage space on-chain

Should:

  • implement the format only for string node names, and data properties for properties that have been registered in the schema (referencing their PropertyID from the schema module Property schemas #17)
  • serializer should write out nodes and properties in normalized form (i.e. alphabetical, no blank nodes), return a "normalized" graph instance to the caller, and return the graph hash
  • deserializer should verify graph has been serialized in normalized form and return the computed graph hash
  • write thorough tests, including generative tests
  • write thorough docs including grammar of format
  • add CHANGELOG entry

DEV NOTES:
the grammar for the data should be roughly as follows:

File = FileVersion Node*
FileVersion = <varint encoding of file format version>
Node = NodeID Property*
NodeID = 0x0 <node-name-string>
Property = PropertyID PropertyValue
PropertyID = 0x0 <integer property id from schema module>
PropertyValue = <binary encoding of property value based on schema type>
  • a special "un-named" root node is allowed in every graph
  • the classes for a node (currently unsupported) should be serialized at the start of every node

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions