Introduce module in trie-db for generating/verifying trie proofs. #45

jimpo · 2019-12-10T16:30:21Z

Generation and verification of compact proofs for Merkle-Patricia tries. These have the benefit over the compact trie encoding of omitting the values from the proof data. Apart from being a more standard logical separation for a proof interface, this can result in bandwidth savings if the verifier already has the values. For example, they may request a proof that a value at a key is the same as it is with respect to another trie root.

Using this module, it is possible to generate a logarithmic-space proof of inclusion or non-inclusion of certain key-value pairs in a trie with a known root. The proof contains information so that the verifier can reconstruct the subset of nodes in the trie required to lookup the keys. The trie nodes are not included in their entirety as data which the verifier can compute for themself is omitted. In particular, the values of included keys and and hashes of other trie nodes in the proof are omitted.

The proof is a sequence of the subset of nodes in the trie traversed while performing lookups on all keys. The trie nodes are listed in pre-order traversal order with some values and internal hashes omitted. In particular, values on leaf nodes, child references on extension nodes, values on branch nodes corresponding to a key in the statement, and child references on branch nodes corresponding to another node in the proof are all omitted. The proof is verified by iteratively reconstructing the trie nodes using the values proving as part of the statement and the hashes of other reconstructed nodes. Since the nodes in the proof are arranged in pre-order traversal order, the construction can be done efficiently using a stack.

Fixes paritytech/substrate#3782.

hujw77 · 2020-11-04T12:11:18Z

Is there a plan to expose this API in substrate-state-machine crate?

cheme · 2020-11-04T12:18:21Z

There was paritytech/substrate#4938 , not sure how I will attempt to resume work on it (I recently updated the branch though, but I am also considering different approach to integrate it)

hujw77 · 2020-11-04T13:09:27Z

I wrote a solidity version of the verification. I am working on the storage proof of substrate, Could you give me the specification of substrate's storage. Thanks!

cheme · 2020-11-04T13:50:12Z

https://github.com/w3f/polkadot-spec/blob/master/host-spec/c02-state.tm (https://github.com/w3f/polkadot-spec/releases) gives you the basis for the trie state encoding. (maybe reading gossamer or kogame implementation can be interesting (different language and code base so can be easier for some)).
Then the current proof (non compact) are just a scale encoded Vec of encoded nodes as done by derived encoding of https://github.com/paritytech/substrate/blob/f7a8b1001d1819b7a887ae36d6beae84617499d8/primitives/trie/src/storage_proof.rs#L29 .
The nodes can be stored in any order.

For compact proof, it is the same thing except nodes needs to be ordered in a specific way and useless encoded hashes are replaced by encoding of a 0 length inline node (

trie/trie-db/src/trie_codec.rs

Line 144 in f054631

Some(empty_child)

).
Useless encoded hash are those that can be recalculated when rebuilding the trie (any child hash that point to a node that is already in the proof).

There is also a variant of compact proof where we want to check if a set of storage values did change, and where we also omit writing the value in the encoded nodes (basically code in https://github.com/paritytech/trie/tree/master/trie-db/src/proof).

But mostly I don't think there is written spec of those compact proof except code documentation.
Also I would probably wait a bit before re-implementing them (there is no guaranty they will make it to substrate, even if I think it would be very good).

hujw77 · 2020-11-04T17:12:22Z

Trie node encoding specification

Note that for the following definitions, | denotes concatenation

Branch encoding:
NodeHeader | Extra partial key length | Partial Key | Value
NodeHeader is a byte such that:
most significant two bits of NodeHeader: 10 if branch w/o value, 11 if branch w/ value
least significant six bits of NodeHeader: if len(key) > 62, 0x3f, otherwise len(key)
Extra partial key length is included if len(key) > 63 and consists of the remaining key length
Partial Key is the branch's key
Value is: Children Bitmap | SCALE Branch node Value | Hash(Enc(Child[i_1])) | Hash(Enc(Child[i_2])) | ... | Hash(Enc(Child[i_n]))

Leaf encoding:
NodeHeader | Extra partial key length | Partial Key | Value
NodeHeader is a byte such that:
most significant two bits of NodeHeader: 01
least significant six bits of NodeHeader: if len(key) > 62, 0x3f, otherwise len(key)
Extra partial key length is included if len(key) > 63 and consists of the remaining key length
Partial Key is the leaf's key
Value is the leaf's SCALE encoded value

This is the trie node specification, I found it in gossamer and https://github.com/w3f/polkadot-spec/blob/master/host-spec/c02-state.tm
I have some questions?

The current proof (non compact) and compact proof are the same encoding?
I can understand the logic of compact proof, but the logic or algorithm of current proof I still do not understand. Can you show me the code or something which I can dive in the current proof?
Thank you again!

cheme · 2020-11-04T17:55:52Z

The current proof (non compact) and compact proof are the same encoding?

At trie node level it is the same (compact uses 'empty inline node' which is impossible as a way to add information without changing the encoding).

Then at proof level:
Non compact is a set of nodes.
Compact is an ordered set of nodes.
But in the end they both are represented as a list of encoded nodes (Vec<Vec>).
Just in the compact case the order of the nodes defines the structure of the trie and 'empty inline node' indicate the child is in the proof and the hash should be calculated from it.

One other tiny difference is that for compact proof you cannot have a single proof with nodes from different tries, so in case of child trie used by the proof, compact will need a different encoded where the proof is split by trie (in the PR I did, this was named Full when Flat was a single trie proof).

I can understand the logic of compact proof, but the logic or algorithm of current proof I still do not understand. Can you show me the code or something which I can dive in the current proof?

The set of nodes is put in a hashmap with key being the encoded node hash.
Then to verify the proof we just run the process (can be any runtime call or key value access) to check over a trie that use this hash map as a encoded node backend.

So if we check a single key access, then we start from root, fetch the encoded root node from the hashmap, decode it, get the child hash for this key, fetch the child encoded node, decode it, repeat
until either
we got our value
or we fail to fetch a child (incomplete proof) or have a branch where the child hash is not define (missing value).

In https://github.com/paritytech/substrate/blob/f7a8b1001d1819b7a887ae36d6beae84617499d8/primitives/state-machine/src/lib.rs#L816
'create_proof_check_backend' is just instantiating a trie backend that instead of using rocksdb for node storage uses a hashmap build from the encoded nodes in the proof (https://github.com/paritytech/substrate/blob/833fe6259115625f61347c8413bab29fded31210/primitives/state-machine/src/proving_backend.rs#L291).
Then 'read_child_proof_check_on_proving_backend' is just running the 'get' ('storage') operation for every keys.

hujw77 · 2020-11-05T11:21:04Z

I implemented a solidity version of the two verification. Welcome advice~
https://github.com/HuJingwei/merkle-proof

cheme · 2020-11-05T12:03:11Z

Amazing 👍
Long time I did not do or read solidity, wondering what is the threshold (in number of nodes access) where using a mapping instead of iterating over https://github.com/HuJingwei/merkle-proof/blob/15d12e9b708f087fccecd1efbf47d07f9fcea7a9/src/SimpleMerkleProof.sol#L54 in getNodeData becomes cheaper (maybe it is never the case, I did not really remember the costs).

hujw77 · 2020-11-05T12:13:06Z

It is a good advice, but mappings can only be stored in the storage data location. I will test mapping in the storage maybe cheaper.

cheme · 2020-11-05T12:25:03Z

mappings can only be stored in the storage

Oh, sounds bad, probably will need to keep array hashes for refund, but still I remember storage ops being so much more expensive. Not sure testing is worth it :)
Probably sorting the array and have faster search be another way to optimize it if needed at some point.

Introduce module in trie-db for generating/verifying trie proofs.

546c75b

jimpo requested a review from cheme December 10, 2019 16:30

jimpo mentioned this pull request Dec 13, 2019

New crate for compact trie proofs. #30

Closed

cheme mentioned this pull request Dec 17, 2019

Replace ElasticArray with SmallVec paritytech/parity-common#282

Merged

Merge branch 'master' into jimpo/trie-proof

6d6485a

rphmeier merged commit b283aad into master Jan 8, 2020

rphmeier deleted the jimpo/trie-proof branch January 8, 2020 15:50

rphmeier mentioned this pull request Jan 8, 2020

Bump trie-db version to v0.19.0 #57

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce module in trie-db for generating/verifying trie proofs. #45

Introduce module in trie-db for generating/verifying trie proofs. #45

jimpo commented Dec 10, 2019 •

edited

Loading

hujw77 commented Nov 4, 2020

cheme commented Nov 4, 2020

hujw77 commented Nov 4, 2020

cheme commented Nov 4, 2020

hujw77 commented Nov 4, 2020

cheme commented Nov 4, 2020

hujw77 commented Nov 5, 2020

cheme commented Nov 5, 2020

hujw77 commented Nov 5, 2020

cheme commented Nov 5, 2020

Introduce module in trie-db for generating/verifying trie proofs. #45

Introduce module in trie-db for generating/verifying trie proofs. #45

Conversation

jimpo commented Dec 10, 2019 • edited Loading

hujw77 commented Nov 4, 2020

cheme commented Nov 4, 2020

hujw77 commented Nov 4, 2020

cheme commented Nov 4, 2020

hujw77 commented Nov 4, 2020

Trie node encoding specification

cheme commented Nov 4, 2020

hujw77 commented Nov 5, 2020

cheme commented Nov 5, 2020

hujw77 commented Nov 5, 2020

cheme commented Nov 5, 2020

jimpo commented Dec 10, 2019 •

edited

Loading