Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove post runtime digests in the runtime before executing a block #63

Open
bkchr opened this issue Oct 21, 2021 · 12 comments
Open

Remove post runtime digests in the runtime before executing a block #63

bkchr opened this issue Oct 21, 2021 · 12 comments
Labels
I4-refactor Code needs refactoring. I5-enhancement An additional feature request.

Comments

@bkchr
Copy link
Member

bkchr commented Oct 21, 2021

There are digests that are added after a block was build by the runtime. This includes digests like Seals that are added by the individual consensus implementations. Currently we rely on having the client code checking these seals and removing them from the header before we execute a block. This ensures that the generated header is equal to the one that is passed in as part of the block.

While this works for standalone blockchains, it does not work for parachains. As a parachains also need to validate a block as part of the block validation on the relay chain, they need to have this Seal removing code inside the runtime. For AURA this is done for example here. This BlockExecutor can be "stacked", as we do it currently with the BlockImports and then we would call these stacked block executors (while Executive being the inner most one) when we need to execute a block.

CC @andresilva

@remzrn
Copy link

remzrn commented Oct 26, 2021

Hello, linking this issue since I saw the comment that you are referring to in the source code while looking at another issue that prevents Edgeware (Aura chain) to sync the history. I think those two could be related.

@bkchr
Copy link
Member Author

bkchr commented Mar 22, 2022

Currently we rely on the client side to remove this seal. While that works, it is also shitty because people often run into these problems where they try to execute a block and it fails. It mainly fails because of the mismatch in the number of digests, which is expected because the runtime isn't putting the seal onto the block. However, with this pr it would only be about executing the block and the runtime would take care of checking and removing the seal.

We would need to make all the code backwards compatible, because the old runtimes don't support the seal alongside the block. We could do this for example by bumping the Core runtime api version to signal this change.

@bkchr
Copy link
Member Author

bkchr commented Mar 23, 2022

Okay, after thinking again about this. We should probably not bump the version of the Core api. Instead we can just bump the respective runtime api versions of Babe and Aura.

@wigy-opensource-developer

Okay, so the main goal of this issue is that blocks reach the runtime unaltered, making some tooling much simpler.

  1. The least amount of changes I can see is that CheckedHeader::Checked would contain a post-runtime header (seal not removed from it), and a decorator is implemented for the ExecuteBlock trait that removes all DigestItem::Seal instances from the digest log independently of the consensus ID.
  2. A second step is not just to remove, but recheck the seal in the runtime - dependent on the consensus of course. Aura and Babe pallets would be changed to handle the unaltered block header, others could still stick to the decorator from step 1.
  3. An even bigger scope would be to remove the seal checking from the client. But that would strongly interfere with equivocation checking and spam filtering in the client before executing a block, so we decided this is not part of this issue.

At the end of this fix, seals will be checked both in the client and - if Babe/Aura is used - also in the runtime. And cumulus will not need the aura_ext pallet, but it can use the updated aura one.

@bkchr
Copy link
Member Author

bkchr commented Apr 12, 2022

2. Aura and Babe pallets would be changed to handle the unaltered block header, others could still stick to the decorator from step 1.

I don't get this. What you mean by "handle the unaltered block header"? When would that happen?

@bkchr
Copy link
Member Author

bkchr commented Apr 12, 2022

See the following post as some sort of brain dump:

  • We will need to make the clients backwards compatible. That means every block that was produced before this change was enacted on chain, would need to remove the seal from header before executing the block in the runtime. Every block that was produced after the change was enacted, would require to still have the seal as part of the header.
  • We would check the seal and any other kind of checks that are required to ensure that the seal is valid. Something like this
  • BABE primary threshold calculation will require some love. Aka it will need to be rewritten to integer arithmetic using sp-arithmetic
  • BABE will not be that important because it will never be supported by parachains. This is mainly about making Parachains and Solochains use the same code paths. Parachains already require this special code to verify the Aura seals as part of the validation function.
  • As Sassafras will be supported by Parachains, we will need to add the support for this as well. So, we could do this directly here in Substrate.
  • People also often stumble over a failing execute_block call because they forget to pop any seals that are not produced by the runtime. However, it makes no sense to not have the runtime having control over this, because it is already aware of the consensus it is using and thus can do the appropriate seal check etc.
  • To find out when this change landed on any runtime, we should bump the version of the runtime apis of Babe/Aura to detect this.
  • We will need to update the Polkadot spec to mention this.

@bkchr
Copy link
Member Author

bkchr commented Apr 12, 2022

@wigy-opensource-developer please before continuing, prepare some benchmark that compares the seal verification of Aura from the wasm runtime vs using the client code. This doesn't need to be anything fancy for the beginning. Just some dirty code that runs the verification in a loop and you post the numbers in here.

@wigy-opensource-developer

Do you see a good shortcut compared to building a test client with a test runtime, getting it to author some blocks, create another test client with a slightly different test runtime, where I can revert the last block easier and then measure the time spent in validating the blocks generated by the first test client?

I could not find smaller scope existing tests for pallets like Aura and Babe. So even if it is not fancy, fulfilling all dependencies for this test is quite some coding.

Alternatively, I can extract static methods from the pallets and exercise benchmarks on them, but their performance depends on the WASM executor, so it needs to be embedded there somehow.

@bkchr
Copy link
Member Author

bkchr commented Apr 13, 2022

https://github.com/paritytech/substrate/blob/de8b34c814089e9561cbc1cdb4399bfc91d4cd46/test-utils/runtime/src/lib.rs#L777-L780

https://github.com/paritytech/substrate/blob/de8b34c814089e9561cbc1cdb4399bfc91d4cd46/primitives/api/test/benches/bench.rs#L59-L65

Something like this. You clearly will not need to build a full block. In the end it is just about creating a signature and verifying the signature in the runtime. Maybe we already have some numbers for this as part of the benchmarking. @ggwpez do you know if these numbers?

@ggwpez
Copy link
Member

ggwpez commented Apr 20, 2022

We have both, just the verification function or full blocks (PS: I forgot to explain the full blocks below).

  1. The baseline sr25519_verification benchmark can be run either with the native or the WASM executor.

WASM:

./target/production/substrate benchmark pallet --chain=dev --steps=50 --repeat=20 --pallet="frame-benchmarking" --extrinsic='sr25519_verification' --execution=wasm --wasm-execution=compiled

or native:

./target/production/substrate benchmark pallet --chain=dev --steps=50 --repeat=20 --pallet="frame-benchmarking" --extrinsic='sr25519_verification' --execution=native

It returns the following:

  • WASM 47.2 µs
  • Native 46.8 µs

It's nearly the same 🤔, kinda weird.
At least it align with the output of benchmark machine which reports 666 KiB/s for sr25519 verify with 32 byte input.
32 byte / 666 KiB/s = 32 byte / (666 * 2¹⁰) byte/s = 46.9 µs

@andresilva
Copy link
Contributor

The WASM code will call into a host function to do the actual signature verification, so the small difference should be just the overhead of dispatching from the runtime into a host function. I guess in this case it is negligible. The only thing that I expect could cause a performance difference is moving the BABE slot assignment computation from floats to using sp_arithmetic. We can measure that later.

@bkchr
Copy link
Member Author

bkchr commented Apr 21, 2022

  • WASM 47.2 µs

  • Native 46.8 µs

That is perfect! Ty @ggwpez for your help :)

The rest is as @andresilva explained.

jonathanudd pushed a commit to jonathanudd/polkadot-sdk that referenced this issue Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I4-refactor Code needs refactoring. I5-enhancement An additional feature request.
Projects
Status: backlog
Development

No branches or pull requests

7 participants