Conversation
🤖 Kimi Code ReviewReview SummaryThis PR introduces several performance optimizations focused on reducing redundant computation and memory copies. The changes look solid overall, with good attention to performance bottlenecks. Here are my findings: Issues Found
Positive Aspects
Suggested Fixes
// In store.rs, make cache invalidation atomic with layer update
let mut trie_cache_guard = trie_cache.lock().map_err(|_| StoreError::LockError)?;
trie_node_cache.clear();
*trie_cache_guard = Arc::new(trie_mut);
// In CachingBackendTrieDB::put_batch, add:
for (key, _) in &key_values {
let prefixed_key = self.inner.make_key(*key);
self.cache.insert(prefixed_key, Vec::new()); // or remove
}
let prefixed_key = self.inner.make_key(key);
if let Some(value) = self.cache.get(&prefixed_key) { ... }Overall, this is a well-executed performance optimization PR with minor issues that should be addressed before merging. Automated review by Kimi (Moonshot AI) · custom prompt |
🤖 Claude Code ReviewPerfect! Now I have a comprehensive understanding of the PR. Let me compile my code review. Code Review: PR #6163 - Block Pipeline Performance OptimizationsOverviewThis PR targets three performance bottlenecks in the block processing pipeline through targeted optimizations. The approach is solid and well-documented. Critical Issues1. Trie Node Cache Invalidation Too Aggressive
|
🤖 Codex Code ReviewFindings
If you want, I can dig into the call paths to confirm whether the L2 Automated review by OpenAI Codex · custom prompt |
Lines of code reportTotal lines added: Detailed view |
… per block The RCU pattern in apply_trie_updates clones the entire TrieLayerCache on every block. The AtomicBloomFilter (~875KB for 1M items at 2% FPR) was the dominant cost of this clone. Wrapping bloom in Arc makes the clone a pointer copy. This is safe because AtomicBloomFilter::insert/contains take &self (atomic operations), and the filter is only replaced (not mutated in place) during rebuild_bloom() which creates a fresh Arc. The layers HashMap clone remains but is cheap (FxHashMap of Arc<TrieLayer> pointers + metadata only).
…ss threads Both the warmer thread and 16 merkleizer shards traverse the same trie nodes from RocksDB. The TrieLayerCache only caches committed trie updates (diff layers), not trie node reads. This adds a 16-shard LRU cache (4096 entries per shard) at the BackendTrieDB level, shared via Arc across all trie instances. When the warmer reads a trie node, it populates the cache; when the merkleizer reads the same node moments later, it's a cache hit instead of a full RocksDB read transaction. Changes: - Add TrieNodeCache (sharded LRU) and CachingBackendTrieDB/CachingBackendTrieDBLocked wrappers in trie.rs - Add trie_node_cache field to Store, shared across all trie constructors - Wrap BackendTrieDB with CachingBackendTrieDB in open_state_trie, open_storage_trie, and their locked variants - Clear cache in apply_trie_updates after disk commit to invalidate stale entries - Make BackendTrieDB::make_key, table_for_key, db and BackendTrieDBLocked::tx_for_key pub(crate) for the wrapper
…validation validate_receipts_root calls compute_receipts_root which re-encodes every receipt (bloom computation + RLP) serially after all txs complete. This work can be done incrementally during per-tx execution, overlapping with merkleization. Changes: - Add encoded_receipts field to BlockExecutionResult - Pre-encode each receipt via encode_inner_with_bloom() immediately after each tx in both execute_block and execute_block_pipeline - Add compute_receipts_root_from_encoded and validate_receipts_root_from_encoded that skip re-encoding - Update all 3 call sites in blockchain.rs to use the pre-encoded variant - Update L2 BlockExecutionResult constructors with empty encoded_receipts
…reads The shared TrieNodeCache caches trie node reads across all trie instances, but during snap sync the trie is mutated directly (account insertion, state healing) without invalidating the cache. This caused request_storage_ranges to read stale trie nodes after state healing, failing with "Could not find account that should have been downloaded or healed". Add Store::clear_trie_node_cache() and call it after account insertion, after storage insertion, and before request_storage_ranges (after state healing).
…ming make_key takes Nibbles by value, so key must be cloned before passing to inner.get() on cache miss.
cf73c95 to
1d6f901
Compare
Benchmark Block Execution Results Comparison Against Main
|
Motivation
After profiling ethrex at steady-state on Hoodi (75% total CPU reduction across 3 prior rounds), the remaining bottlenecks include the TrieLayerCache deep clone in the RCU pattern (~875KB bloom filter copied per block), redundant trie node reads from RocksDB by both warmer and merkleizer threads, and receipt encoding done synchronously during validation.
Description
Three optimizations targeting the block pipeline:
AtomicBloomFilterdeep copy that happens every block inapply_trie_updates. SinceAtomicBloomFilter::insert(&self)is already atomic, sharing viaArcis safe. ThelayersHashMap clone remains but is cheap (values areArc<TrieLayer>).TrieNodeCache) at theBackendTrieDBlevel, shared across warmer, merkleizer, and execution threads viaArc. When the warmer reads a trie node ahead of execution, the merkleizer gets a cache hit instead of another RocksDB read transaction. Shard count matches the merkleizer's 16 worker threads.encode_inner_with_bloom) is done once per tx during execution and passed throughBlockExecutionResult. Validation uses the pre-encoded bytes directly viavalidate_receipts_root_from_encoded, avoiding redundant re-encoding.EXPB Benchmark Results
fast scenario
gigablocks scenario
Test plan
ethrex-storage,ethrex-common,ethrex-blockchain,ethrex-vm)