Tracking: Commit blocks to state using a separate task #4937
Description
Motivation
Zebra takes 10-15 minutes to commit some blocks to the state while checkpointing, around blocks 1,718,00 to 1,772,000.
The slow blocks are different on different runs.
This is unacceptable performance, because:
- it's much slower than
zcashd
- Zebra will appear to hang for 15 minutes, which is a usability and security issue
- it causes warnings in the Zebra logs
- if it's remotely triggerable, it could be a denial of service risk
Diagnosis
Zebra queues up to 1200 blocks, then commits them all in the same state request, after the missing block arrives. This can take up to 10 seconds per block.
Design
Add a block commit task to the state, which runs in a separate thread. The task should be between the block queue and the block verifier.
We'll need to move the shared mutable chain state into the block commit task, so we will also need to redirect StateService
read requests to the concurrent ReadStateService
.
Here is a diagram of the new state design:
https://docs.google.com/drawings/d/1FXpAUlenDAjl8nkftrypdAPsj0jr-Ut9gZlSP57nuyc/edit
Implementation Plan
Stop Accessing Mutable Chain State
Set Up Channels
Setup Block Commit task
- Add a new block commit task with unused channels
Add channels to send blocks to the task
- Add a channel that handles finalized state
CommitFinalizedBlock
requests - Add a channel that handles non-finalized state
CommitBlock
requests - We want two channels so we can wait for the last finalized block before committing the first non-finalized block (by height)
- The current implementation of this has a bug: Avoid temporary failures verifying the first non-finalized block #5125
Error Handling & Testing
- Handle panics in the block commit task by panicking in the service
- Testing - what new tests do we want
Optional tasks:
Optional Cleanup Tasks
Bug fixes:
Refactors:
- Make pending_utxos.respond() async using a channel, so we can use
ReadRequest::ChainUtxo
inAwaitUtxo
Renames & Formatting:
- Rename every instance of address
*
ortransparent_*
toaddress_*
- Put the
Request
andResponse
enums in a consistent order
In Scope
- Non-finalized state
- Finalized state
- Running the task in a separate thread
Out of Scope
We don't think we'll need to make these changes as part of this change:
- Scale lookahead limit based on upcoming checkpoint sizes #5101
- Check for downloaded hashes in a batch #5103
(this reduces the number of state requests from the syncer)
These are definitely out of scope:
- Other state refactors
- Other performance improvements
- Note commitment tree performance improvements
Metadata
Labels
Type
Projects
Status
Done