-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Block building within the same wasm memory? #10557
Description
This is not necessarily a feature request or call to action, but just writing down the thoughts on this topic.
Whenever substrate imports a block, it would call a runtime API function exposed as execute_block. Under the hood, it would initialize the state of the block, pick and run each and every extrinsic and then finalize the block, and all within the same wasm instance. This means that all memory is persistent within the two invocations. Simply speaking, changes to memory in on_initialize will be visible in on_finalize.
However, In contrast to this, while building a block, each state will be its own runtime call: initializing the block will be one, applying an extrinsic will be another.
This means that FRAME or other runtime code cannot assume that the memory is persistent between the calls.
There is a number of reasons why this may be desirable:
- in my experience working with contracts, there were several times when I wish the memory was preserved between calls.
- Get rid of junk in storage proofs. #9170 which is another case, which probably solved via other means but still.
- passing data between
on_initializeandon_finalizewithout touching storage. It happens fairly often when we need to do something inon_finalize. For example, remove N elements from a list that satisfy some condition. However, usingon_finalizerequireson_initializeto return weight to-be-consumed byon_finalize, implying thaton_initializewill need to see how many items satisfy the predicate thus getting N. However, even thoughon_initializedid the work,on_finalizewill need to run the same code to decide which elements to prune. It would be good ifon_initializecould just communicate toon_finalizewhat items it needs to remove. - this also ties back to the ideas of adding the ephemeral storage into Substrate Runtime as a some sort of host support via a custom child trie. It seems that storing the values in memory can help solving this issue without introducing an ephemeral in a more elegant way. Not sure if all use-cases can be covered by it though.
This also ties back to the issue of wasm instance spawning overhead. During the recent work on #10244 we found out that right now an instance call overhead is at least ≈50µs. If we take our target of 1000 tps for polkadot we get 12k tx per parablock, thus in cumulus it would take 600ms only on wasm instance spawning overhead, a good chunk of time by any means. While this is not critical and possibly we will get bottlenecked somewhere else, still something to keep in mind.
One approach to tackle this is to simply keep an instance between the runtime API calls, instead of creating a new one each time.
Similar issue was discussed between me and @rphmeier where we came up with an idea where instead of external iteration we could use internal. That is, it's when the runtime controls the block filling. As a strawman: the block building interface would look like a single call into the runtime. That call will do the initialization of the block, then fetching the extrinsics and applying them and then finalization. The fetching part is the most interesting here: the runtime calls a specific non-deterministic function which returns the next transaction. That probably has an entire can of tradeoffs that should be thought through, like how do we handle timeouts and so on.
One problem that prevents us from making the memory persistent between the stages of block execution is that some extrinsics can panic. Although this is not normal, this can potentially happen. When that happens, we want to make sure that the already possible DoS vector is not amplified by the implementation details. Giving a runtime call a new wasm instance is handy because it can destroy it and the block authorship module could just move on. If we were to preserve the memory between calls we need to figure how to recover from such a situation quickly.
This now brings us to the classic ideas like paritytech/polkadot-sdk#370. With integration of wasmtime we figured out that we probably should employ mmap/CoW techniques. In contemporary history, we are thinking about resurrecting the CoW approach to drive down the wasm spawning latency. Perhaps, the very same mechanism may allow us to implement the last part of paritytech/polkadot-sdk#370 effeciently?