perf(l1): speed up snap sync validation with parallelism and deduplication#6191
perf(l1): speed up snap sync validation with parallelism and deduplication#6191
Conversation
🤖 Kimi Code ReviewReview SummaryThis PR introduces parallel validation for Ethereum state tries and storage roots, along with optimizations for bytecode validation. Here are the key issues found: 1. Critical Bug in
|
🤖 Codex Code ReviewFindings:
Positives: the unique code-hash dedup and chunked parallelism are good for performance; the new parallel trie validation is a solid speedup with minimal structural change. Tests not run (not requested). If you want, I can run targeted tests or add a validation-mode flag to preserve strong integrity checks while keeping the fast path. Automated review by OpenAI Codex · custom prompt |
Lines of code reportTotal lines added: Detailed view |
Greptile OverviewGreptile SummaryThis PR introduces significant performance optimizations to snap sync validation functions by leveraging parallelism and smart filtering. The changes are well-architected and follow sound optimization principles. Key improvements:
Technical soundness:
Minor issue:
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| crates/common/trie/trie.rs | Added validate_parallel() method and validate_subtree() helper for parallel trie validation by splitting at root branch node - implementation is sound with proper hash checking via get_node_checked() |
| crates/networking/p2p/sync/snap_sync.rs | Optimized validation functions with parallel processing, deduplication, and empty storage filtering - solid optimizations with one minor potential issue in error handling |
| crates/storage/store.rs | Added code_exists() method for efficient existence checks - skips blob reads and RLP decoding, properly checks cache first |
Sequence Diagram
sequenceDiagram
participant Validator as Validation Function
participant Rayon as Rayon Thread Pool
participant Trie as Trie (validate_parallel)
participant DB as TrieDB
participant Store as Store (code_exists)
Note over Validator: validate_state_root
Validator->>Trie: validate_parallel()
Trie->>DB: get_node_checked(root)
DB-->>Trie: root branch node
Trie->>Rayon: par_iter().try_for_each(16 subtrees)
loop For each subtree (parallel)
Rayon->>Trie: validate_subtree(path, ref)
Trie->>DB: get_node_checked(nodes)
DB-->>Trie: nodes with hash validation
end
Rayon-->>Trie: all subtrees valid
Trie-->>Validator: Ok(())
Note over Validator: validate_storage_root
Validator->>Store: iter_accounts(state_root)
Store-->>Validator: accounts iterator
Validator->>Validator: filter(storage_root != EMPTY)
loop Chunked (4096 per chunk)
Validator->>Rayon: par_iter().try_for_each(chunk)
Rayon->>Store: open_locked_storage_trie()
Store->>Trie: validate()
Trie->>DB: sequential validation
DB-->>Trie: validated
Trie-->>Store: Ok(())
Store-->>Rayon: Ok(())
end
Rayon-->>Validator: all storage tries valid
Note over Validator: validate_bytecodes
Validator->>Store: iter_accounts(state_root)
Store-->>Validator: accounts iterator
Validator->>Validator: collect unique code hashes (HashSet)
Validator->>Rayon: par_iter().filter(missing)
loop For each unique hash (parallel)
Rayon->>Store: code_exists(hash)
Store->>Store: check cache
Store->>DB: get(ACCOUNT_CODES, hash)
DB-->>Store: exists/not exists
Store-->>Rayon: bool
end
Rayon-->>Validator: missing hashes list
| use rayon::prelude::*; | ||
| let missing: Vec<_> = unique_hashes | ||
| .par_iter() | ||
| .filter(|code_hash| !store.code_exists(**code_hash).unwrap_or(false)) |
There was a problem hiding this comment.
using unwrap_or(false) silently treats database errors as "code doesn't exist", which could mask real issues during validation
| .filter(|code_hash| !store.code_exists(**code_hash).unwrap_or(false)) | |
| .filter(|code_hash| !store.code_exists(**code_hash).expect("failed to check code existence")) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 780:780
Comment:
using `unwrap_or(false)` silently treats database errors as "code doesn't exist", which could mask real issues during validation
```suggestion
.filter(|code_hash| !store.code_exists(**code_hash).expect("failed to check code existence"))
```
How can I resolve this? If you propose a fix, please make it concise.
🤖 Claude Code ReviewNow let me compile my review. I have enough context to provide a comprehensive review. PR #6191 Review: Speed up snap sync validation with parallelism and deduplicationOverall AssessmentThis is a well-motivated optimization PR that improves debug validation performance without affecting production builds. The approach is sound, but there are several correctness and safety issues that need addressing. Critical Issues1. Thread Safety Violation in
|
There was a problem hiding this comment.
Pull request overview
This PR speeds up ethrex’s post-snap-sync correctness validations (state trie, storage tries, and bytecode presence) that run under debug_assert! in release-with-debug-assertions, reducing end-to-end sync validation time while preserving the extra integrity checks.
Changes:
- Added parallel trie validation by splitting root branch subtrees and validating them concurrently with rayon.
- Optimized storage trie validation by skipping empty storage roots and replacing
par_bridge()with chunked collection +par_iter(). - Optimized bytecode validation by deduplicating code hashes and adding a
Store::code_exists()helper for existence checks.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
crates/storage/store.rs |
Adds code_exists() helper used by bytecode validation. |
crates/networking/p2p/sync/snap_sync.rs |
Switches state root validation to parallel mode; optimizes storage and bytecode validation workflows. |
crates/common/trie/trie.rs |
Introduces Trie::validate_parallel() and subtree validator implementation. |
crates/common/trie/Cargo.toml |
Adds rayon dependency for parallel trie validation. |
Cargo.lock |
Locks new rayon dependency. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Check DB without reading the full value | ||
| Ok(self | ||
| .backend | ||
| .begin_read()? | ||
| .get(ACCOUNT_CODES, code_hash.as_bytes())? | ||
| .is_some()) |
There was a problem hiding this comment.
code_exists claims to avoid "blob value reads", but it calls begin_read().get(ACCOUNT_CODES, ...), which on the RocksDB backend is configured with set_enable_blob_files(true) for ACCOUNT_CODES and will still fetch the value (often from blob files). If the goal is an existence-only check without reading bytecode blobs, consider checking ACCOUNT_CODE_METADATA (fixed-size, non-blob CF) first and only falling back to ACCOUNT_CODES if metadata is missing (older DBs).
| // Check cache first | ||
| if self | ||
| .account_code_cache | ||
| .lock() | ||
| .map_err(|_| StoreError::LockError)? | ||
| .get(&code_hash)? | ||
| .is_some() | ||
| { | ||
| return Ok(true); |
There was a problem hiding this comment.
code_exists always takes the account_code_cache mutex even though cache hits are likely rare during bulk validation. When called from a rayon par_iter (as in validate_bytecodes), this mutex can serialize the hot path and reduce parallel speedups. Consider either skipping the code cache for code_exists, or restructuring the cache to allow concurrent reads (e.g., RwLock + a read-only lookup method).
| // Check cache first | |
| if self | |
| .account_code_cache | |
| .lock() | |
| .map_err(|_| StoreError::LockError)? | |
| .get(&code_hash)? | |
| .is_some() | |
| { | |
| return Ok(true); | |
| // Opportunistically check cache first without blocking other threads. | |
| match self.account_code_cache.try_lock() { | |
| Ok(cache) => { | |
| if cache.get(&code_hash)?.is_some() { | |
| return Ok(true); | |
| } | |
| } | |
| Err(std::sync::TryLockError::Poisoned(_)) => { | |
| // Preserve previous behavior on poisoned lock. | |
| return Err(StoreError::LockError); | |
| } | |
| Err(std::sync::TryLockError::WouldBlock) => { | |
| // Contended: skip cache and fall back to DB check. | |
| } |
| use rayon::prelude::*; | ||
| let missing: Vec<_> = unique_hashes | ||
| .par_iter() | ||
| .filter(|code_hash| !store.code_exists(**code_hash).unwrap_or(false)) |
There was a problem hiding this comment.
unwrap_or(false) converts any StoreError from code_exists into "missing" without logging the underlying error, which can make diagnosing DB/corruption issues harder (and can misattribute failures). Consider handling the Err case explicitly (e.g., log the error and treat validation as failed) instead of silently defaulting to false.
| .filter(|code_hash| !store.code_exists(**code_hash).unwrap_or(false)) | |
| .filter(|code_hash| { | |
| match store.code_exists(**code_hash) { | |
| Ok(exists) => !exists, | |
| Err(err) => { | |
| error!( | |
| "Error while checking existence of code hash {:x}: {:?}", | |
| code_hash, err | |
| ); | |
| // Treat store errors as validation failures (equivalent to "missing") | |
| true | |
| } | |
| } | |
| }) |
| account_state.storage_root, | ||
| ) | ||
| .expect("couldn't open storage trie") | ||
| .validate() |
| let db = &*self.db; | ||
| let root_node = self | ||
| .root | ||
| .get_node_checked(db, Nibbles::default())? |
There was a problem hiding this comment.
Note: get_node_checked won't detect a hash mismatch if the root node was already loaded.
This shouldn't matter right now given we usually call this on a freshly opened trie, but it might be a good idea to verify.
| start_path: Nibbles, | ||
| start_ref: NodeRef, | ||
| ) -> Result<(), TrieError> { | ||
| let mut expected_count: isize = 1; |
There was a problem hiding this comment.
It shouldn't be necessary to keep this. validate() only had this because the iterator has no way of distinguishing "ended because of error" and "ended because all nodes were visited".
22da23c to
f4a8095
Compare
…ation Optimize the three debug_assert! validation functions that run after snap sync to verify trie integrity. These are used during testing with release-with-debug-assertions builds to validate optimization correctness. validate_state_root: - Add validate_parallel() to Trie that splits at the root branch node and validates each of the 16 subtrees independently via rayon (~4-8x) validate_storage_root: - Skip accounts with empty storage tries (EMPTY_TRIE_HASH) — ~90% of accounts are EOAs with nothing to validate - Replace par_bridge() with chunked collect + par_iter to eliminate per-item mutex contention from the sequential iterator adapter validate_bytecodes: - Deduplicate code hashes via HashSet before checking — ~28:1 ratio of contracts to unique bytecodes on mainnet - Add Store::code_exists() for key-only existence checks, avoiding full blob reads and RLP decoding - Parallelize with rayon par_iter on the deduplicated set
- Remove unused top-level `rayon::iter::ParallelIterator` import (clippy error) - Update sp1/risc0/zisk/openvm Cargo.lock files for rayon dep in ethrex-trie - Propagate DB errors in `validate_bytecodes` instead of silently converting them to "code missing" via `unwrap_or(false)` (ElFantasma review) - Fix `code_exists` docstring: the underlying `get()` still reads the full value from RocksDB; the savings are from skipping RLP decoding and Code struct construction, not blob reads (ElFantasma review)
…import - Fix cargo fmt in trie.rs - Update remaining Cargo.lock files (tee/quote-gen, levm/bench, tooling) - Add ParallelIterator import to not(rocksdb) insert_storages to fix compilation in prover/exec backend CI targets
3c691a8 to
1a190ab
Compare
Benchmark Results ComparisonNo significant difference was registered for any benchmark run. Detailed ResultsBenchmark Results: BubbleSort
Benchmark Results: ERC20Approval
Benchmark Results: ERC20Mint
Benchmark Results: ERC20Transfer
Benchmark Results: Factorial
Benchmark Results: FactorialRecursive
Benchmark Results: Fibonacci
Benchmark Results: FibonacciRecursive
Benchmark Results: ManyHashes
Benchmark Results: MstoreBench
Benchmark Results: Push
Benchmark Results: SstoreBench_no_opt
|
Benchmark Block Execution Results Comparison Against Main
|
Motivation
The three post-sync validation functions (
validate_state_root,validate_storage_root,validate_bytecodes) run viadebug_assert!inrelease-with-debug-assertionsbuilds to verify trie integrity after snap sync. We use these to validate that optimization changes don't break sync correctness, but they take too long — slowing down the testing feedback loop.Neither geth nor nethermind perform full trie traversal after snap sync (they trust healing completion), so these are ethrex-specific correctness checks we want to keep but make faster.
Description
validate_state_root— parallel subtree validationTrie::validate_parallel()that fetches the root branch node and validates each of its 16 subtrees independently via rayonvalidate_subtree()helper for iterative DFS with hash checkingvalidate_storage_root— skip empty + eliminate par_bridge overheadstorage_root == EMPTY_TRIE_HASHbefore processing (~90% of accounts are EOAs with empty storage)par_bridge()with chunkedcollect()+par_iter()(4096 accounts per chunk) to eliminate the per-item mutex contention inherent topar_bridge's sequential-to-parallel adaptervalidate_bytecodes— dedup + existence check + parallelizeHashSetfirst — on mainnet ~70M contracts share only ~2.5M unique bytecodes (28:1 ratio)Store::code_exists()that checks key existence without reading the full bytecode blob or RLP decoding (avoids BlobDB value reads)par_iter()over the deduplicated setHow to Test
cargo build --release --bin ethrex --profile release-with-debug-assertionsStarting validate_state_root/Succesfully validated treeStarting validate_storage_root/Finished validate_storage_rootStarting validate_bytecodes/Collected N unique code hashes for validation