Skip to content
This repository has been archived by the owner on Aug 19, 2020. It is now read-only.

Runtime Storage #38

Merged
merged 26 commits into from
May 8, 2020
Merged

Runtime Storage #38

merged 26 commits into from
May 8, 2020

Conversation

danforbes
Copy link
Contributor

Closes #5

@danforbes danforbes force-pushed the runtime-storage branch 3 times, most recently from 5860324 to be692c3 Compare April 14, 2020 08:34
@joepetrowski joepetrowski added the A3 - In Progress Not ready for review yet. label Apr 14, 2020
@danforbes danforbes force-pushed the runtime-storage branch 2 times, most recently from 1aa1627 to e2fba2d Compare April 15, 2020 06:49
@danforbes
Copy link
Contributor Author

@thiolliere can you please review this?

Copy link
Contributor

@gui1117 gui1117 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like explaining what each methods do in the trait is a bit redundant with the trait documentation.
I'm not sure of the scope of this doc but I feel like part of it is redundant with decl_storage and traits doc.

Also in my opinion storage doc should be more about explaining the trie and then just saying:

  • value is an abstraction which stores one value at a certain path in the trie
  • map stores value at some_path++hasher(key)
  • double map stores avalue at some_path++ hasher1(key)++hasher2(key)

I think then it is more understandable what are the cost of methods.

But I'm not very aware about how is organized developer-hub so this is just humble suggestions.

by the way I think Shawn deep dive was very good explanation of trie https://www.shawntabrizi.com/substrate/substrate-storage-deep-dive/

current/runtime/storage.md Outdated Show resolved Hide resolved
current/runtime/storage.md Outdated Show resolved Hide resolved
current/runtime/storage.md Outdated Show resolved Hide resolved

### Storage Maps

Storage Maps are implemented as hash maps, which is a pattern that should be familiar to most developers. In order to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storage Maps are implemented as hash maps

maybe it is only me but for me hash map is a data structure which handle collision. at least if presented this way it should be clear that collision are not handled.

But actually instead of using "hash map" term I would rather say that it is a map where key are hashed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this clarification. Please let me know if you think my solution was sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I see, I'm not sure actually what I proposed was better, like

Storage Maps are implemented as maps with hashed keys, which is a pattern that should be familiar to most developers.

wouldn't have make sense to me I think if I didn't know it already, not sure how to do better. (still for me the clearest is that a storage map insert at the path hash(encode(key)) the value encoded.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to clarify this in the docs 👍

Storage Maps are implemented as hash maps, which is a pattern that should be familiar to most developers. In order to
give blockchain engineers increased control over the way in which these data structures are stored, Substrate allows
developers to select the hashing algorithm that is used to generate map keys. Map data structures are ideal for managing
sets of items whose elements will be accessed randomly, as opposed to iterating over them sequentially in their entirety.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to show how it is implemented in the underlying trie to make it more understandable about advantage/disadvantage.

Basically the value is put in the trie at the path==some_storage_prefix++hashed_key.
So each value is stored in an individual node, so reading/writing a value only decode/encode one raw_value which is stored in one trie-node.

also the performance of the map will depends on how big the map is because read/writing one value will recompute the path==some_prefix++hashed_key. and the hashed_key part can be more or less costy to compute. https://www.shawntabrizi.com/substrate/transparent-keys-in-substrate/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a separate document that goes over how storage is implemented. This document is meant to explain Substrate's storage interfaces and help blockchain runtime developers understand when and how to use them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this document will not explain how frame_support::storage abstraction are implemented in storage do they ? so I would explain their implementation here actually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the purpose of the advance document is to cover implementation and the purpose of this document (the runtime document) is to cover the interface. @joepetrowski can you provide your input?


[Storage Maps expose an API](https://substrate.dev/rustdocs/master/frame_support/storage/trait.StorageMap.html#required-methods)
that is similar to that of Storage Values. The selected methods referenced below all use a single key; use two keys for
[Storage Double Maps](https://substrate.dev/rustdocs/master/frame_support/storage/trait.StorageDoubleMap.html#required-methods):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also I think we should say that using this two keys result in this in the trie: key1, key2, value is stored at
value is stored at some_prefix++hashe1(key1)++hasher2(key2).
and so you can iterate on one key1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is information I also think would belong in the advanced document.

The Substrate storage API provides iterable map implementations. Because maps are often used to track unbounded sets of
data (account balances, for example) it is especially likely to exceed block production time by iterating over maps in
their entirety within the runtime. Furthermore, because maps are comprised of more layers of indirection than native
lists, they are significantly more costly than lists to iterate over with respect to time. Depending on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just explain what is the "more layers of indirection than native lists", basically iterating over a map is just iterating over all keys after the prefix storage_prefix so this trie iteration is more costy than native list because it is a trie.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt like that was a step too far for this document and that topics like that belong in the advanced storage documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above document just explain how the trie works not how frame_support storage abstractions are implemented using the trie, no ?
As a user in order to understand more precisely the cost of those abstraction, explaining their implementation on the trie seems useful (maybe it belong somewhere else in the doc but still I feel like saying that a storage map just store all its value after a prefix and iterating upon means iterating the trie after a prefix is quite straightforward and quickly understandable)

Because maps are often used to track unbounded sets of
data it is especially likely to exceed block production time by iterating over maps in
their entirety within the runtime

This is just a trade off, some storage map in the runtime are quite bounded actually. But we use storage map because they are more often accessed by individual value than as a whole. Thus using a storage map is more efficient because accessing one value is cheaper, but it is sometime iterated upon still because it is not too big.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to clarify this in the docs 👍

current/runtime/storage.md Outdated Show resolved Hide resolved
#### Methods

[Iterable Storage Maps expose the following methods](https://substrate.dev/rustdocs/master/frame_support/storage/trait.IterableStorageMap.html#required-methods)
in addition to the other map methods:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

their method are slightly different for map and doublemap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to clarify this in the docs 👍

##### `translate(fn)`

Use the provided function to translate all elements of the map, in no particular order. To remove an element from the
map, return `None` from the translation function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they also differ between map and double_map (as a note I tried to unify interface here paritytech/substrate#5335)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to clarify this in the docs 👍

@danforbes
Copy link
Contributor Author

Thank you @thiolliere! I have tried to address your comments while balancing the goal of this document with respect to the goal of the advanced storage document.

Copy link
Contributor

@joepetrowski joepetrowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think a practical section on best practices would be useful.

current/runtime/storage.md Outdated Show resolved Hide resolved
current/runtime/storage.md Outdated Show resolved Hide resolved
current/runtime/storage.md Outdated Show resolved Hide resolved
current/runtime/storage.md Outdated Show resolved Hide resolved
current/runtime/storage.md Outdated Show resolved Hide resolved
current/runtime/storage.md Outdated Show resolved Hide resolved
current/runtime/storage.md Outdated Show resolved Hide resolved
current/runtime/storage.md Outdated Show resolved Hide resolved
@shawntabrizi
Copy link
Contributor

Is the goal of this PR to actually have all this content on a single page? Or are you just collecting information here?

Jumping between conceptual items around storage, to macro syntax about declaration, to RPC call stuff seems like a lot for one page, and what I would consider to be quite different topics when it comes to storage.

I might tune your focus into two sections with somewhat different goals:

  • Conceptually what is Runtime Storage in Substrate?

    • Goal: Someone will understand from a theoretical perspective how substrate storage works, and the design decisions.
  • Using storage in Substrate runtime code

    • Goal: How do work with the macro, the generated types, etc... what are all the nuiances of the syntax, things you can do, cant do etc...

Someone who does not really want to make the best decisions but just wants code to work should read the second doc. Someone who just wants to learn conceptually whats going on should read the first. A "real dev" would read both and be able to make the right decisions using the combined information.

If you break down Substrate Storage Conceptually, then a topic like the RPC queries become a lot more natural "We are just querying a database key, no other abstractions of storage leak through the RPC".

It also informs the user the advantages of a single value storage item versus a storage map, something that is probably the most important topic to take away for the average dev. They should be thinking at this level.

Then when it comes to the ergonomics of the language, some users just want to know what to type to make the thing compile, and that would go into all the syntax level details I see here.

As it is written, I think it is hard to ingest and get a clear message imo.

@danforbes danforbes force-pushed the runtime-storage branch 3 times, most recently from df43692 to f818588 Compare April 28, 2020 13:26
@danforbes
Copy link
Contributor Author

@thiolliere and @shawntabrizi - I have made some structural changes that I hope will address your comments. @joepetrowski - per our conversation yesterday, I think this is now ready for a final review.

@danforbes danforbes changed the title [WIP] Runtime storage Runtime Storage Apr 29, 2020
@danforbes danforbes added A0 - Please Review The PR is ready for review. K1 - Runtime Info about the runtime. K6 - Advanced/Other Doesn't fit other K-labels. T3 - Enhancement A current page needs more info. and removed A3 - In Progress Not ready for review yet. labels Apr 29, 2020
current/runtime/storage.md Outdated Show resolved Hide resolved
current/runtime/storage.md Outdated Show resolved Hide resolved
current/runtime/storage.md Outdated Show resolved Hide resolved
current/runtime/storage.md Outdated Show resolved Hide resolved
current/runtime/storage.md Outdated Show resolved Hide resolved
current/runtime/storage.md Outdated Show resolved Hide resolved
current/runtime/storage.md Outdated Show resolved Hide resolved
current/advanced/storage.md Outdated Show resolved Hide resolved
current/advanced/storage.md Outdated Show resolved Hide resolved
@danforbes
Copy link
Contributor Author

@joepetrowski - any suggestions for enhancing the section on genesis configuration?

current/runtime/storage.md Outdated Show resolved Hide resolved
current/runtime/storage.md Outdated Show resolved Hide resolved
Comment on lines 247 to 253
### Genesis Config

TODO
You can define
[an optional `GenesisConfig`](https://substrate.dev/rustdocs/master/frame_support/macro.decl_storage.html#genesisconfig)
struct in order to initialize Storage Items in the genesis block of your blockchain.

## Storage Cache
// TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thiolliere what info do you think should go here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenesisConfig is generated by decl_storage macro, it does initialize storage items for the genesis block of the blockchain.

Its logic is defined by:

  • config and build attribute when declaring an individual storage.
  • add_extra_genesis informations. which allows to add fields in the GenesisConfig struct and a build function to build storages using those fields.

(I can be more precise actually we execute first for each storage the build closure if it exist, or if no build attribute but a config attribute then we add into storage the value found in config. Then we execute the add_extra_genesis::build function.)

Depending on [the hashing algorithm](#Transparent-Hashing-Algorithms) that you select to generate a
map's keys, you may be able to iterate across its keys and values. Because maps are often used to
track unbounded sets of data (account balances, for example) it is especially likely to exceed block
production time by iterating over maps in their entirety within the runtime. Furthermore, because

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, because
accessing the elements of a map requires more pointer dereferencing than accessing the elements of a
native list, maps are significantly more costly than lists to iterate over with respect to time.

Can you explain this a bit more? I am not sure where it comes from and until now my understanding of the state was that the most important factors encoding + db access. In-memory dereferencing seems negligible to me.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I agree that iterating maps is expensive, I am not sure if your reason is correct.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this line used to read as follows:

Furthermore, because maps are comprised of more layers of indirection than native lists...

I agree with you that the change didn't necessarily make things more clear. I guess what I meant above was that you don't need to dereference memory pointers, per se, but you do have to dereference the pointer to the element in the map...I think I'm trying to describe the "db access" part that you mention above. I will think about alternate ways to phrase this but would definitely appreciate any suggestions 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this now reads:

...accessing the elements of a map requires more database reads than accessing the elements of a native list...

I hope you agree the simplest approach was the best one in this case 😅

wrong. Being efficient within the runtime of a blockchain is an important first principle of
Substrate and this information is designed to help you understand _all_ of Substrate's storage
capabilities and use them in a way that respects the important first principles around which they
were designed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to mention here when and which maps are iterable at the end of the day.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great point and I'm starting to wonder if we should even document non-transparent hashers since they are now deprecated. Should we just behave as if all hashers are transparent and all maps are iterable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asked @shawntabrizi about this offline and he agrees that we should take the simpler approach: only document that which is not deprecated 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to clarify this in the docs 👍

Copy link

@kianenigma kianenigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only had a skim and the text reads ver well to me. As Shawn mentioned I think this is a lot of info and should be organised well so the user can follow it easily. Great work 👍

@gui1117
Copy link
Contributor

gui1117 commented May 4, 2020

Sorry I didn't realized earlier but it seems non-transparent hasher are deprecated so:

/// Supported hashers (ordered from least to best security):
///
/// * identity - Just the unrefined key material. Use only when it is known to be a secure hash
/// already. The most efficient and iterable over keys.
/// * twox_64_concat - TwoX with 64bit + key concatenated. Use only when an untrusted source
/// cannot select and insert key values. Very efficient and iterable over keys.
/// * blake2_128_concat - Blake2 with 128bit + key concatenated. Slower but safe to use in all
/// circumstances. Iterable over keys.
///
/// Deprecated hashers, which do not support iteration over keys include:
/// * twox_128 - TwoX with 128bit.
/// * twox_256 - TwoX with with 256bit.
/// * blake2_128 - Blake2 with 128bit.
/// * blake2_256 - Blake2 with 256bit.
///

Thus maybe better to write that storage map are iterable and double map are also iterable, and maybe add a note that this is not the case if you use deprecated hasher opaque_twox_128, opaque_... ...

@danforbes danforbes force-pushed the runtime-storage branch 2 times, most recently from 2718bd1 to 5411956 Compare May 5, 2020 01:06
@danforbes
Copy link
Contributor Author

@shawntabrizi, @thiolliere, @kianenigma - I have no more changes to make at this time based on your comments. Let me know if I addressed your concerns properly or if there are additional concerns you have.

Copy link
Contributor

@gui1117 gui1117 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

| [TwoX 64 Concat](https://crates.parity.io/frame_support/struct.Twox64Concat.html) | | X |
| [Identity](https://crates.parity.io/frame_support/struct.Identity.html) | | |
| [Blake2 128](https://crates.parity.io/frame_support/struct.Blake2_128.html) **DEPRECATED** | X | |
| [TwoX 128](https://crates.parity.io/frame_support/struct.Twox128.html) **DEPRECATED** | | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is actually 4 depracated hasher but I don't mind if we don't mention them:

twox_128 - TwoX with 128bit.
twox_256 - TwoX with with 256bit.
blake2_128 - Blake2 with 128bit.
blake2_256 - Blake2 with 256bit.

current/runtime/storage.md Outdated Show resolved Hide resolved
[the `sc_chain_spec::ChainSpec` trait](https://crates.parity.io/sc_chain_spec/trait.ChainSpec.html).
For a complete and concrete example of using Substrate's genesis storage configuration capabilities,
refer to the `decl_storage` macro in
[the Society pallet](https://github.com/paritytech/substrate/blob/master/frame/society/src/lib.rs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be better to point to example pallet ?

Copy link
Contributor Author

@danforbes danforbes May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that the example pallet does any genesis configuration.


// all storage writes go here; no throwing code below this line

// all event emissions go here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

events can be interleaved with storage writes

initializing storage, all of which have entry points in the `decl_storage` macro. These mechanisms
all result in the creation of a `GenesisConfig` data type that implements
[the `sp_runtime::BuildModuleGenesisStorage` trait](https://crates.parity.io/sp_runtime/trait.BuildModuleGenesisStorage.html)
and will be added to the storage item's module (e.g.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I read this correctly, what does the runtime GenesisConfig being added to the storage item's module means ?

Maybe we could make it more clear that there is a GenesisConfig overarching type for the runtime which is made by construct_runtime by joining the specified pallet genesis config (specified with Config)

Dan Forbes and others added 3 commits May 7, 2020 19:07
Copy link
Contributor

@joepetrowski joepetrowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@joepetrowski joepetrowski added A9 - Buy That Person a Beer! Author deserves a beer and removed A0 - Please Review The PR is ready for review. labels May 8, 2020
@joepetrowski joepetrowski merged commit a214b8b into master May 8, 2020
@joepetrowski joepetrowski deleted the runtime-storage branch May 8, 2020 09:37
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A9 - Buy That Person a Beer! Author deserves a beer K1 - Runtime Info about the runtime. K6 - Advanced/Other Doesn't fit other K-labels. T3 - Enhancement A current page needs more info.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Runtime storage
5 participants