Stop batch verification lagging and failing entire batches #4729
Description
Motivation
The batch verifier can lag and fail entire blocks, even if the proofs and signatures are valid. We should re-design it so it can't lag.
If we can't do that, the dropped verifications should fail immediately, rather than waiting for the block verification timeout:
2022-06-30T17:51:02.943983ZERROR{net="Main"}:sync:try_to_sync:extend_tips:zebra_consensus::primitives::redpallas: batch verification receiver lagged and lost verification results
2022-06-30T17:51:57.423166Z INFO{net="Main"}:zebrad::components::sync::progress: estimated progress to chain tip sync_percent=99.819% current_height=Height(1718380) network_upgrade=Nu5 remaining_sync_blocks=3119 time_since_last_state_block=1m
...
2022-06-30T17:54:57.425405Z INFO{net="Main"}:zebrad::components::sync::progress: estimated progress to chain tip sync_percent=99.819% current_height=Height(1718380) network_upgrade=Nu5 remaining_sync_blocks=3122 time_since_last_state_block=4m
2022-06-30T17:55:57.426619Z INFO{net="Main"}:zebrad::components::sync::progress: estimated progress to chain tip sync_percent=99.819% current_height=Height(1718381) network_upgrade=Nu5 remaining_sync_blocks=3121 time_since_last_state_block=0s
2022-06-30T17:56:47.835664Z WARN{net="Main"}:sync:try_to_sync:zebrad::components::sync: error downloading and verifying block e=Invalid { error: Block(Transaction(InternalDowncastError("downcast to known transaction error type failed, original error: Elapsed(())"))), height: Height(1718445), hash: block::Hash("0000000000f0f61e6f42984784ad367711c0b3e704e840797606314426dc2a90") }
https://github.com/ZcashFoundation/zebra/runs/7135923149?check_suite_focus=true#step:6:644
Designs
We can either:
- replace the batch verifier with a watch channel, and create a new watch channel for each batch
- create a new broadcast channel for each batch, so it only ever has one result in it, and make the channel size 1
Activity