Skip to content

Latest commit

 

History

History
147 lines (115 loc) · 6.18 KB

README.org

File metadata and controls

147 lines (115 loc) · 6.18 KB

Data Module

STATUS: PARTIAL

Overview

This module provides very basic support for hashing and time-stamping data onto the blockchain. It serves the purpose of providing a way that any piece of data can be tracked onto the blockchain for a fee and known to have existed at or before some given block-height. It is intended that this module will mostly be supplanted by other more domain specific functionality and/or enhanced with robust, opt-in schema-validation support in the future.

Motivation and Rationale

Requirements

  • It should be possible to store arbitrary data on the blockchain for a fee
  • It should be possible to track arbitrary off-chain data by hash on the blockchain, thus generating a proof of timestamp
  • On-chain and off-chain data should be available in the index available to oracles
  • There must be a robust way for dealing with hash collisions, especially with respect to off-chain data whose content is opaque

Transaction Messages and Types

Store RDF graph on-chain

type MsgStoreGraph struct {
  // RDF graph data in N-Triples text format with no blank nodes allowed!
  NTriples string `json:"ntriples"`
	  // Expected hash of the graph. The transaction will be rejected if this hash can't be verified.
  URDNA2015_BLAKE2B_256_Hash []byte `json:"urdna2015_blake2b_256_hash"`
  Signer sdk.AccAddress `json:"signer"`
}

N-Triples format has been chosen as a starting point because it is easy to parse and self-contained. **Blank nodes are not allowed in on-chain graphs!** This restriction makes it easy to verify that the dataset is canonicalized and that the hash matches, without having to run the full canonicalization algorithm on-chain. The N-Triples data passed in must be in canonicalized form which essentially means that it is sorted because blank nodes are not allowed.

NOTE The reason JSON-LD has not been chosen for on-chain usage is that the way `@context` is designed explicitly requires JSON-LD processors to pull off-chain HTTP data which is indeterministic.

In the future, we would like to support a compact binary format.

should data schema (i.e. SCHACL/SHEX/JSON-SCHEMA) be tracked and/or verified on-chain?

It might be useful to track format on-chain but not verify it. For a given format there could be multiple schemas that it satisfies. My current thoughts are that this is a type of verification/validation that can be done off chain and there can be on-chain attestations about that - ARC.

Track off-chain RDF dataset

type MsgTrackDataset struct {
  URDNA2015_BLAKE2B_256_Hash []byte `json:"urdna2015_blake2b_256_hash"`
  Url string `json:"url,omitempty"`
  Signer sdk.AccAddress `json:"signer"`
}

should data stores that reference off-chain data have their own on-chain reference and data tracking instead of a URL just reference the service via which it can be retrieved by hash?

i.e. if we know the service ID we can just do an HTTP GET for <service-base-uri>/<hash>.

Track arbitrary off-chain data

type HashAlgorithm int

const (
  BLAKE2B_256 HashAlgorithm = 0
  SHA256 HashAlgorithm = 1
)

type MsgTrackData struct {
  Hash []byte `json:"hash"`
  Algorithm HashAlgorithm `json:"algorithm"`
  Url string `json:"url,omitempty"`
  Signer sdk.AccAddress `json:"signer"`
}

Store arbitrary data on-chain

This is a use case we may want to support but for now are not supporting it because it is questionable whether we should encourage storing data on-chain that can’t be interpreted by other on-chain infrastructure.

allow for tracking off-chain data which has a salt/nonce introduced

allow for multiple URL’s to be provided for off-chain data and to allow possible ways to deal with hash collisions

possibly make URL’s for off-chain data optional

support for tracking the merkle roots of off-chain data stores

This should probably be coordinated with the IBC spec

Identifiers

On-chain graphs

On-chain graphs are identified by the URI formed by encoding the URNDNA2015_BLAKE2B_256 hash of the graph with the prefix xrn://<block-number>/g/.

Off-chain datasets

Off-chain datasets are identified by the URI formed by encoding the URNDNA2015_BLAKE2B_256 hash of the dataset with the prefix xrn://<block-number>/ds/.

Off-chain raw data

Off-chain raw data is identified by the URI formed by encoding the Blake2b 256-bit hash of the data prefixed with xrn://<block-number>/dt/.

On-chain raw data??

On-chain raw data is identified by the URI formed by encoding the Blake2b 256-bit hash of the data prefixed with xrn://<block-number>/da/.

Indexing and Queries

Postgresql

CREATE TABLE "data" (
  uri text NOT NULL PRIMARY KEY,
  tx bytea NOT NULL REFERENCES tx,
  graph jsonb
  --raw_data bytea
);

COMMENT ON COLUMN graph.graph IS 'The JSON-LD expanded form representation of an on-chain graph';

COMMENT ON COLUMN graph.raw_data IS 'Raw data bytes for on-chain raw data';

RDF

Schema

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX xrn: <http://regen.network/schema#>

xrn:urdna2015Blake2b256Hash a rdf:Property ;
  rdfs:range xsd:base64Binary .

On-chain graphs

On chain graphs are indexed in the RDF store in the named graph identified with the graph identifier URI. They are annotated in the default graph as follows (where xrn://12345/g/1xq52sutm is an example graph URI):

PREFIX xrn: <http://regen.network/schema#>

<xrn://12345/g/1xq52sutm>
  xrn:tx <xrn://12345/tx/abcdef1234567> ;
  xrn:urdna2015Blake2b256Hash "sdgbhABN38dsfgn23t=" .