perf(l1): reduce trie/FKV memtable size and raise L0 compaction trigger#6183
perf(l1): reduce trie/FKV memtable size and raise L0 compaction trigger#6183
Conversation
Two complementary RocksDB tuning changes: 1. Reduce write_buffer_size for trie and FKV column families from 512MB to 128MB, and max_write_buffer_number from 6 to 4. This keeps SkipList trees shallower, reducing InlineSkipList::Insert and MemTable::KeyComparator overhead that grows superlinearly with memtable depth. 2. Raise L0 compaction trigger from 2/4 to 8 (global and per-CF), slowdown from 10 to 20, stop from 16 to 36. This batches more L0 files per compaction event, reducing total compaction work. Higher triggers are needed to accommodate the more frequent flushes from smaller memtables.
🤖 Kimi Code ReviewReview SummaryThis PR adjusts RocksDB configuration parameters to optimize memory usage and reduce compaction overhead for trie and flat key-value storage. The changes appear reasonable for performance tuning, but I have a few observations: Issues Found
Recommendations
The changes appear safe from a consensus perspective since they only affect performance characteristics, not data correctness. The memory reduction is substantial and should help with resource usage, though it may need monitoring in production. Automated review by Kimi (Moonshot AI) · custom prompt |
🤖 Codex Code ReviewReview
No correctness, security, or Ethereum‑specific logic issues spotted in this diff; it’s configuration-only. If you want, I can suggest a small comment or add a lightweight benchmark note. Next steps (optional):
Automated review by OpenAI Codex · custom prompt |
🤖 Claude Code ReviewPerfect! Now I have all the context I need. Let me provide a comprehensive code review. PR Review: Reduce trie/FKV memtable size and raise L0 compaction triggerSummaryThis PR implements two complementary RocksDB performance optimizations based on profiling data: reducing memtable size to decrease SkipList depth, and raising L0 compaction triggers to reduce compaction overhead. The changes are well-motivated and the implementation is clean. ✅ Strengths
🔍 Detailed Reviewcrates/storage/backend/rocksdb.rsLines 41-43: Global L0 compaction triggers (2→8, 10→20, 16→36)Status: ✅ Good
Lines 91-93: Per-CF L0 compaction triggersStatus:
// Line 91 could be removed if all CFs should use global defaults
// cf_opts.set_level_zero_file_num_compaction_trigger(8); // Already set globallyLines 122-123 & 134-135: Trie/FKV memtable reduction (512MB→128MB, 6→4 buffers)Status: ✅ Good, with monitoring needed Positive aspects:
Considerations:
Math check: CHANGELOG.md:5-6Status: ✅ Good
🔒 Security & Correctness
⚡ Performance ConsiderationsExpected improvements:
Potential risks:
Mitigation: These are balanced by the complementary changes (higher L0 triggers accommodate more flushes) 📊 Testing RecommendationsSince benchmarks are still running (per PR description), monitor:
🎯 RecommendationsHigh PriorityNone - the changes are sound as-is. Low Priority (Nice to have)
✅ Final VerdictLGTM with minor suggestions. The changes are:
The complementary nature of the two optimizations (smaller memtables + higher L0 triggers) shows good systems thinking. The 83% memory reduction is substantial and should help with resource-constrained deployments. Approval status: ✅ Approved (pending successful benchmark results) Automated review by Claude (Anthropic) · custom prompt |
There was a problem hiding this comment.
Pull request overview
Updates RocksDB tuning parameters to reduce write-path CPU overhead in the trie/flat-kv column families and reduce compaction churn by batching more L0 files per compaction.
Changes:
- Increase L0 compaction triggers (trigger/slowdown/stop) to reduce frequency of small L0 compactions.
- Reduce memtable (
write_buffer_size) andmax_write_buffer_numberfor trie + flat-kv CFs to keep SkipList depth smaller. - Add a Perf changelog entry for the tuning change.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| crates/storage/backend/rocksdb.rs | Adjusts RocksDB L0 compaction thresholds and trie/FKV memtable sizing parameters. |
| CHANGELOG.md | Documents the performance tuning change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| opts.set_level_zero_file_num_compaction_trigger(8); | ||
| opts.set_level_zero_slowdown_writes_trigger(20); | ||
| opts.set_level_zero_stop_writes_trigger(36); |
There was a problem hiding this comment.
The L0 compaction triggers are set both on opts and again on every cf_opts in the CF descriptor loop. Since cf_opts already applies these values for all column families, this duplication can drift over time and makes it harder to tell which settings are intended to be authoritative. Consider defining shared constants and applying them only once (preferably on cf_opts, or remove the opts.set_level_zero_* calls if they’re not needed).
| opts.set_level_zero_file_num_compaction_trigger(8); | |
| opts.set_level_zero_slowdown_writes_trigger(20); | |
| opts.set_level_zero_stop_writes_trigger(36); |
Greptile OverviewGreptile SummaryThis PR tunes RocksDB configuration for the L1 storage backend by (1) reducing write buffer sizes / max memtables for trie and flat key-value column families to reduce memtable skiplist depth and insert overhead, and (2) increasing L0 compaction triggers (global and per-CF) to batch more L0 files per compaction and reduce compaction churn. Changes are localized to the RocksDB backend’s Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| CHANGELOG.md | Adds a perf changelog entry for 2026-02-11 referencing #6183; no functional impact. |
| crates/storage/backend/rocksdb.rs | Adjusts RocksDB Options/CF options: reduces write buffers for trie/FKV and raises L0 compaction triggers. No correctness issues found, but behavior change increases write stall thresholds and flush frequency. |
Sequence Diagram
sequenceDiagram
participant App as Ethrex
participant Backend as RocksDBBackend::open
participant Rocks as RocksDB
App->>Backend: open(path)
Backend->>Backend: Build global Options
Backend->>Rocks: list_cf(opts, path)
Rocks-->>Backend: existing CF names
Backend->>Backend: Union(existing CFs, TABLES)
loop For each CF
Backend->>Backend: Build CF Options
Backend->>Backend: Apply per-CF memtable + L0 trigger tuning
end
Backend->>Rocks: open_cf_descriptors(opts, cf_descriptors)
Rocks-->>Backend: DB handle
Backend->>Rocks: drop_cf(obsolete CFs)
Backend-->>App: RocksDBBackend
Benchmark Block Execution Results Comparison Against Main
|
Motivation
Two complementary RocksDB bottlenecks identified from CPU profiles:
Deep SkipList trees (Rank 5): Trie and FKV column families have 512MB write buffers x 6 = 3GB per CF. Deep SkipList trees make
InlineSkipList::Insert(4.02% flat in slow) andMemTable::KeyComparator(1.20% flat) expensive. This grows superlinearly with memtable depth.Aggressive L0 compaction (Rank 6): Global L0 compaction trigger is 2 (extremely aggressive), causing many small compaction events. Compaction dominates CPU: 93% fast, 37% gigablocks, 60% slow.
Description
Rank 5 — Reduce memtable size:
ACCOUNT_TRIE_NODES/STORAGE_TRIE_NODES: write_buffer_size 512MB → 128MB, max_write_buffer_number 6 → 4ACCOUNT_FLATKEYVALUE/STORAGE_FLATKEYVALUE: same reduction (identical write pattern)Rank 6 — Raise L0 compaction trigger:
Benchmark Results
All benchmarks run on
ethrex-office-3with CPU profiling enabled. Baseline:mainat commit1a47fb18.Throughput (Mgas/s)
Latency
CPU Profile Comparison — Fast (runs #63 vs #76)
CPU Profile Comparison — Gigablocks (runs #64 vs #77)
CPU Profile Comparison — Slow (runs #65 vs #78)
Analysis
The optimization targets are clearly confirmed in the slow profile:
The gigablocks scenario shows +13.6% throughput with broad improvements across EVM (-10%), rayon (-9.5%), and read path (-8.4%). This suggests the higher L0 trigger reduces contention between compaction and foreground work.
Fast shows modest +2.6% as expected (short runs don't build deep enough memtables for Rank 5 to matter much).
How to Test
Run the EXPB benchmark suite: