perf(l1): deduplicate trie nodes across layers before writing to RocksDB#6179
perf(l1): deduplicate trie nodes across layers before writing to RocksDB#6179
Conversation
TrieLayerCache::commit() collects nodes from ~128 layers via flat_map into a Vec, producing duplicate entries when the same trie path is modified in multiple layers. Only the latest value matters, but all copies end up in the WriteBatch — the root node alone could be written 128 times. Collect into FxHashMap instead, iterating oldest-to-newest so newer values overwrite older ones. This eliminates duplicates before the WriteBatch is constructed, reducing InlineSkipList::Insert and MemTable::KeyComparator overhead proportionally.
🤖 Kimi Code ReviewReview SummaryThe change is correct and valuable - it fixes a performance issue where duplicate trie nodes across layers would create unnecessary RocksDB write operations. Specific FeedbackFile:
let mut nodes_to_commit = FxHashMap::with_capacity(
layers_to_commit.iter().map(|l| l.nodes.len()).sum()
);
Security/Consensus Impact✅ No security issues - This is purely a performance optimization that doesn't affect consensus or state correctness. The deduplication preserves the "newest wins" semantics which is correct for layered state. Performance✅ Significant improvement - Eliminates O(n×m) duplicate writes where n=number of layers and m=duplicate nodes. This is especially impactful for the root node which gets modified in every layer. The change is ready to merge. Automated review by Kimi (Moonshot AI) · custom prompt |
🤖 Codex Code ReviewFindings (ordered by severity)
If ordering is not relied upon anywhere, the change looks good and is a solid performance win by avoiding duplicate writes. Automated review by OpenAI Codex · custom prompt |
🤖 Claude Code ReviewNow let me analyze the code thoroughly: Code Review: PR #6179 - Deduplicate trie nodes across layers before writing to RocksDBSummaryThis PR optimizes the ✅ Positive Aspects
🔍 Correctness Analysis✅ Correctness: VERIFIED The change is semantically correct:
🔒 Security & Consensus Implications✅ No security concerns identified
⚡ Performance Considerations✅ Performance improvement validated
Minor consideration: The 🦀 Rust Best Practices✅ Idiomatic Rust
Minor style note: Consider adding a capacity hint for the HashMap: let mut nodes_to_commit = FxHashMap::with_capacity_and_hasher(
layers_to_commit.iter().map(|l| l.nodes.len()).sum(),
Default::default()
);This would avoid rehashing during growth, though the impact may be minimal since 🧪 Testing
Recommendation: While the benchmarks validate the performance improvement, consider adding a unit test specifically for the deduplication behavior:
This would prevent regression if the code is refactored. 📝 Minor Suggestions
🎯 VerdictAPPROVED ✅ This is a well-crafted performance optimization with:
The change is low-risk and high-value. The minor suggestions about capacity pre-allocation and testing are nice-to-haves but not blockers. Automated review by Claude (Anthropic) · custom prompt |
Greptile OverviewGreptile SummaryOptimizes Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| crates/storage/layering.rs | Optimizes commit() to deduplicate trie nodes across layers using HashMap before returning, reducing WriteBatch size by 20-50% |
| CHANGELOG.md | Added changelog entry for trie node deduplication optimization, but PR number needs correction (#6177 → #6179) |
Sequence Diagram
sequenceDiagram
participant Caller
participant TrieLayerCache
participant HashMap as FxHashMap
participant RocksDB
Caller->>TrieLayerCache: commit(state_root)
TrieLayerCache->>TrieLayerCache: Remove layers following parent chain
Note over TrieLayerCache: layers_to_commit built newest→oldest
TrieLayerCache->>HashMap: Create empty FxHashMap
loop For each layer (reversed: oldest→newest)
TrieLayerCache->>HashMap: extend(layer.nodes)
Note over HashMap: Newer values overwrite older duplicates
end
TrieLayerCache->>TrieLayerCache: Convert HashMap to Vec
TrieLayerCache->>Caller: Return Vec<(key, value)>
Caller->>RocksDB: WriteBatch with deduplicated nodes
Note over RocksDB: 20-50% fewer entries
Lines of code reportTotal lines added: Detailed view |
Benchmark Block Execution Results Comparison Against Main
|
Motivation
CPU profiling on the slow EXPB scenario (5000 mainnet blocks, ~30 min) shows
InlineSkipList::Insertat 3.90% flat andMemTable::KeyComparatorat 1.17% flat — both driven by WriteBatch size inTrieLayerCache::commit(). When 128 layers are committed,flat_mapcollects ALL trie node entries from ALL layers into aVecwith no deduplication. A node modified in multiple layers (e.g. the root node modified in all 128 layers) produces 128 duplicate WriteBatch entries. Only the latest value matters.Description
Change
TrieLayerCache::commit()to collect nodes into anFxHashMapinstead of aVec, iterating oldest-to-newest so newer values overwrite older ones. This eliminates duplicate entries before the WriteBatch is constructed, reducingInlineSkipList::InsertandMemTable::KeyComparatoroverhead proportionally.No breaking changes
The return type remains
Vec<(Vec<u8>, Vec<u8>)>— the HashMap is converted back to Vec before returning. The only behavioral change is fewer entries in the output (duplicates removed).Benchmark Results (EXPB on ethrex-office-3)
Compared against fresh main baselines (#63-65) run back-to-back on the same machine.
Throughput
Latency — Gigablocks
Latency — Slow (5000 mainnet blocks)
CPU Profile Comparison — Gigablocks
The gigablocks improvement is distributed across functions rather than concentrated in the write path, suggesting reduced overall pressure from smaller WriteBatches.
CPU Profile Comparison — Slow
The slow profile shows modest reductions in the WriteBatch/SkipList path (~2%), indicating the actual duplicate count across 128 layers is lower than the worst case (root node x128). Most leaf nodes are only touched once across the commit window.
Note on gigablocks variance: The gigablocks baselines show natural variance between runs (old baseline #58: 942.98 vs fresh #64: 909.63 = 3.5% difference). The +10.2% improvement should be interpreted with this variance in mind; the slow scenario (+1.7%) is more reliable given its longer duration.
How to Test
fast,gigablocks, andslowscenarios with CPU profilingInlineSkipList::InsertandWriteBatchInternal::InsertIntoshould decrease)