perf(l1): reduce trie/FKV memtable size and raise L0 compaction trigger by ilitteri · Pull Request #6183 · lambdaclass/ethrex

ilitteri · 2026-02-11T18:25:52Z

Motivation

Two complementary RocksDB bottlenecks identified from CPU profiles:

Deep SkipList trees (Rank 5): Trie and FKV column families have 512MB write buffers x 6 = 3GB per CF. Deep SkipList trees make InlineSkipList::Insert (4.02% flat in slow) and MemTable::KeyComparator (1.20% flat) expensive. This grows superlinearly with memtable depth.
Aggressive L0 compaction (Rank 6): Global L0 compaction trigger is 2 (extremely aggressive), causing many small compaction events. Compaction dominates CPU: 93% fast, 37% gigablocks, 60% slow.

Description

Rank 5 — Reduce memtable size:

ACCOUNT_TRIE_NODES / STORAGE_TRIE_NODES: write_buffer_size 512MB → 128MB, max_write_buffer_number 6 → 4
ACCOUNT_FLATKEYVALUE / STORAGE_FLATKEYVALUE: same reduction (identical write pattern)
Shallower SkipList at any given time → cheaper inserts and key comparisons

Rank 6 — Raise L0 compaction trigger:

Global: compaction trigger 2 → 8, slowdown 10 → 20, stop 16 → 36
Per-CF: compaction trigger 4 → 8 (slowdown/stop already at 20/36)
Batches more L0 files per compaction event, reducing total compaction work
Higher triggers accommodate the more frequent flushes from smaller memtables

Benchmark Results

All benchmarks run on ethrex-office-3 with CPU profiling enabled. Baseline: main at commit 1a47fb18.

Throughput (Mgas/s)

Scenario	Baseline (main)	Tuned	Change
Fast (200 blocks)	420.48	431.43	+2.6%
Gigablocks (100 synthetic)	909.63	1033.63	+13.6%
Slow (5000 blocks)	811.04	843.31	+4.0%

Latency

Scenario	Baseline avg / p95	Tuned avg / p95
Fast	67.81ms / 148.88ms	71.65ms / 173.43ms
Gigablocks	1410ms / 2760ms	1180ms / 2100ms
Slow	29.48ms / 76.03ms	27.53ms / 73.65ms

CPU Profile Comparison — Fast (runs #63 vs #76)

Function	Baseline	Tuned	Change
Total samples	38,678ms	37,917ms	-2.0%
CompactionJob::Run	35,870ms cum	34,983ms cum	-2.5%
VM::execute	691ms cum	715ms cum	+3.5%
rayon::join	599ms cum	590ms cum	-1.5%
PosixFile::Read	4,321ms flat	4,377ms flat	+1.3%
RocksDBReadTx::get	477ms cum	501ms cum	+5.0%
BlockBasedTable::Get	397ms cum	415ms cum	+4.5%

CPU Profile Comparison — Gigablocks (runs #64 vs #77)

Function	Baseline	Tuned	Change
Total samples	49,833ms	48,119ms	-3.4%
CompactionJob::Run	18,174ms cum	18,461ms cum	+1.6%
VM::execute	12,370ms cum	11,090ms cum	-10.3%
rayon::join	10,573ms cum	9,569ms cum	-9.5%
PosixFile::Read	5,854ms flat	5,763ms flat	-1.6%
RocksDBReadTx::get	6,297ms cum	5,769ms cum	-8.4%
BlockBasedTable::Get	5,327ms cum	4,940ms cum	-7.3%
apply_trie_updates	318ms cum	299ms cum	-6.0%

CPU Profile Comparison — Slow (runs #65 vs #78)

Function	Baseline	Tuned	Change
Total samples	127,137ms	125,587ms	-1.2%
CompactionJob::Run	76,185ms cum	75,729ms cum	-0.6%
VM::execute	13,244ms cum	12,846ms cum	-3.0%
rayon::join	15,560ms cum	15,383ms cum	-1.1%
PosixFile::Read	11,178ms flat	11,454ms flat	+2.5%
InlineSkipList::Insert	5,105ms flat	3,936ms flat	-22.9%
MemTable::KeyComparator	1,525ms flat	1,436ms flat	-5.8%
RecomputeSpliceLevels	1,755ms cum	1,607ms cum	-8.4%
apply_trie_updates	10,368ms cum	9,089ms cum	-12.3%
MemTableInserter::PutCFImpl	7,570ms cum	6,216ms cum	-17.9%
WriteBatchInternal::Put	634ms flat	683ms flat	+7.7%
BlockBasedTable::Get	5,215ms cum	5,275ms cum	+1.2%

Analysis

The optimization targets are clearly confirmed in the slow profile:

InlineSkipList::Insert dropped 22.9% (5.1s → 3.9s) — shallower memtables mean fewer skiplist levels to traverse during insertion
MemTableInserter::PutCFImpl dropped 17.9% (7.6s → 6.2s) — the entire memtable insertion path benefits
apply_trie_updates dropped 12.3% (10.4s → 9.1s) — cumulative savings from cheaper insertions

The gigablocks scenario shows +13.6% throughput with broad improvements across EVM (-10%), rayon (-9.5%), and read path (-8.4%). This suggests the higher L0 trigger reduces contention between compaction and foreground work.

Fast shows modest +2.6% as expected (short runs don't build deep enough memtables for Rank 5 to matter much).

How to Test

Run the EXPB benchmark suite:

# Fast scenario (200 mainnet blocks)
# Gigablocks scenario (100 synthetic max-gas blocks)
# Slow scenario (5000 mainnet blocks)

Two complementary RocksDB tuning changes: 1. Reduce write_buffer_size for trie and FKV column families from 512MB to 128MB, and max_write_buffer_number from 6 to 4. This keeps SkipList trees shallower, reducing InlineSkipList::Insert and MemTable::KeyComparator overhead that grows superlinearly with memtable depth. 2. Raise L0 compaction trigger from 2/4 to 8 (global and per-CF), slowdown from 10 to 20, stop from 16 to 36. This batches more L0 files per compaction event, reducing total compaction work. Higher triggers are needed to accommodate the more frequent flushes from smaller memtables.

github-actions · 2026-02-11T18:26:18Z

🤖 Kimi Code Review

Review Summary

This PR adjusts RocksDB configuration parameters to optimize memory usage and reduce compaction overhead for trie and flat key-value storage. The changes appear reasonable for performance tuning, but I have a few observations:

Issues Found

Potential Memory Pressure Risk (lines 121-122, 134-135):
- Reducing write_buffer_size from 512MB to 128MB for trie/flatkeyvalue CFs while keeping max_write_buffer_number at 4 means total memtable memory could still reach 512MB (128MB × 4)
- However, the reduction in max_write_buffer_number from 6 to 4 actually reduces maximum memtable memory from 3GB to 512MB, which is a significant reduction
- Concern: This aggressive memory reduction could potentially impact write performance under heavy load
Inconsistent Compaction Trigger Values (lines 41, 91):
- Global/default compaction trigger is set to 8 (line 41)
- But specific column families (line 91) also set it to 8
- This creates redundancy - the global setting already covers all CFs unless overridden
Missing Validation for Extreme Values:
- The new level_zero_stop_writes_trigger of 36 (lines 42, 93) seems high compared to the slowdown trigger of 20
- No validation that these ratios make sense (typically stop should be ~2x slowdown, which 36/20=1.8 satisfies)

Recommendations

Add configuration validation to ensure the ratios between L0 triggers are reasonable:

// Consider adding validation like:
assert!(stop_writes_trigger > slowdown_writes_trigger);
assert!(slowdown_writes_trigger > compaction_trigger);

Document the rationale for these specific values in comments, especially the 4x reduction in write buffer size
Consider making these tunable via configuration file rather than hardcoded, as optimal values may vary by deployment

The changes appear safe from a consensus perspective since they only affect performance characteristics, not data correctness. The memory reduction is substantial and should help with resource usage, though it may need monitoring in production.

Automated review by Kimi (Moonshot AI) · custom prompt

github-actions · 2026-02-11T18:27:21Z

🤖 Codex Code Review

Review

Potential performance regression risk: raising L0 compaction trigger while shrinking memtables can increase L0 file count/compaction lag and read amplification under heavy write load. Consider validating with workload traces or adding a brief rationale in code/comments, especially for trie/flat KV CFs. crates/storage/backend/rocksdb.rs:41-52, crates/storage/backend/rocksdb.rs:91-94, crates/storage/backend/rocksdb.rs:121-137

No correctness, security, or Ethereum‑specific logic issues spotted in this diff; it’s configuration-only.

If you want, I can suggest a small comment or add a lightweight benchmark note.

Next steps (optional):

Run a targeted RocksDB perf regression benchmark for trie-heavy workloads.

Automated review by OpenAI Codex · custom prompt

github-actions · 2026-02-11T18:27:25Z

🤖 Claude Code Review

Perfect! Now I have all the context I need. Let me provide a comprehensive code review.

PR Review: Reduce trie/FKV memtable size and raise L0 compaction trigger

Summary

This PR implements two complementary RocksDB performance optimizations based on profiling data: reducing memtable size to decrease SkipList depth, and raising L0 compaction triggers to reduce compaction overhead. The changes are well-motivated and the implementation is clean.

✅ Strengths

Evidence-based optimization: Changes are driven by actual CPU profiling data (Rank 5 and Rank 6 bottlenecks)
Complementary fixes: The smaller memtable size (causing more frequent flushes) is balanced by higher L0 compaction triggers
Focused scope: Only modifies performance-critical column families (trie and FKV) rather than global changes
No consensus impact: Performance tuning doesn't affect correctness, state transitions, or EIP compliance

🔍 Detailed Review

crates/storage/backend/rocksdb.rs

Lines 41-43: Global L0 compaction triggers (2→8, 10→20, 16→36)

Status: ✅ Good

The 4x increase in compaction trigger is well-justified by the profiling data
Ratios are reasonable: compaction(8) → slowdown(20=2.5x) → stop(36=1.8x)
Aligns with RocksDB best practices for write-heavy workloads

Lines 91-93: Per-CF L0 compaction triggers

Status: ⚠️ Minor redundancy

Line 91 sets compaction_trigger=8 for all CFs, matching the global setting
This creates redundancy but no functional issue
Suggestion: Consider removing line 91 since it duplicates the global setting (line 41), unless you plan to differentiate CFs in future tuning

// Line 91 could be removed if all CFs should use global defaults
// cf_opts.set_level_zero_file_num_compaction_trigger(8); // Already set globally

Lines 122-123 & 134-135: Trie/FKV memtable reduction (512MB→128MB, 6→4 buffers)

Status: ✅ Good, with monitoring needed

Positive aspects:

Reduces total memtable memory from 3GB to 512MB per CF (75% reduction!)
Shallower SkipList trees will reduce InlineSkipList::Insert and MemTable::KeyComparator overhead
The min_write_buffer_number_to_merge=2 setting is unchanged and appropriate

Considerations:

More frequent flushes (every 128MB instead of 512MB) → more L0 files created
This is exactly why L0 compaction trigger was raised (good design!)
Under extreme write pressure, the 4 buffers × 128MB = 512MB limit could cause slowdowns
Recommendation: Monitor write stall metrics in production benchmarks

Math check:

Before: 512MB × 6 buffers = 3GB max memtable memory
After:  128MB × 4 buffers = 512MB max memtable memory
Reduction: 83% less memory usage ✓

CHANGELOG.md:5-6

Status: ✅ Good

Properly documented in the performance section
Includes PR reference for traceability

🔒 Security & Correctness

No security impact: RocksDB tuning parameters don't affect cryptographic operations, state validation, or consensus rules
No data correctness risk: These parameters only affect performance characteristics, not data integrity
Crash recovery: WAL settings unchanged, recovery behavior unaffected
Memory safety: Rust's type system + RocksDB bindings prevent buffer overflows

⚡ Performance Considerations

Expected improvements:

✅ Reduced CPU time in SkipList operations (shallower trees)
✅ Fewer compaction events (batched L0→L1 merges)
✅ Lower memory footprint (83% reduction in memtable memory)

Potential risks:

⚠️ More frequent flushes could increase disk I/O (small writes)
⚠️ Higher L0 file counts before compaction might slow reads temporarily
⚠️ Write stalls possible if 4×128MB fills faster than flush rate

Mitigation: These are balanced by the complementary changes (higher L0 triggers accommodate more flushes)

📊 Testing Recommendations

Since benchmarks are still running (per PR description), monitor:

Write throughput: Ensure smaller memtables don't bottleneck under sustained writes
Read latency: Check if higher L0 file counts affect lookup performance
Compaction stats: Verify that compaction CPU usage actually decreases
Memory usage: Confirm the expected 83% memtable reduction
Write stall frequency: Watch rocksdb.level0.slowdown and rocksdb.level0.num_files

🎯 Recommendations

High Priority

None - the changes are sound as-is.

Low Priority (Nice to have)

Remove redundant setting at line 91 (duplicates global setting at line 41)

Add inline comments explaining the rationale:

// Reduced from 512MB to limit SkipList depth and insertion overhead
cf_opts.set_write_buffer_size(128 * 1024 * 1024); // 128MB
// Reduced from 6 to balance memory usage with flush rate
cf_opts.set_max_write_buffer_number(4);

Consider making these tunable via config file for easier A/B testing in production

✅ Final Verdict

LGTM with minor suggestions. The changes are:

Well-motivated by profiling data
Correctly implemented
Appropriately scoped to affected column families
Free from security/correctness concerns
Worth merging pending benchmark validation

The complementary nature of the two optimizations (smaller memtables + higher L0 triggers) shows good systems thinking. The 83% memory reduction is substantial and should help with resource-constrained deployments.

Approval status: ✅ Approved (pending successful benchmark results)

Automated review by Claude (Anthropic) · custom prompt

Copilot

Pull request overview

Updates RocksDB tuning parameters to reduce write-path CPU overhead in the trie/flat-kv column families and reduce compaction churn by batching more L0 files per compaction.

Changes:

Increase L0 compaction triggers (trigger/slowdown/stop) to reduce frequency of small L0 compactions.
Reduce memtable (write_buffer_size) and max_write_buffer_number for trie + flat-kv CFs to keep SkipList depth smaller.
Add a Perf changelog entry for the tuning change.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
crates/storage/backend/rocksdb.rs	Adjusts RocksDB L0 compaction thresholds and trie/FKV memtable sizing parameters.
CHANGELOG.md	Documents the performance tuning change.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-11T18:29:13Z

crates/storage/backend/rocksdb.rs

+        opts.set_level_zero_file_num_compaction_trigger(8);
+        opts.set_level_zero_slowdown_writes_trigger(20);
+        opts.set_level_zero_stop_writes_trigger(36);


The L0 compaction triggers are set both on opts and again on every cf_opts in the CF descriptor loop. Since cf_opts already applies these values for all column families, this duplication can drift over time and makes it harder to tell which settings are intended to be authoritative. Consider defining shared constants and applying them only once (preferably on cf_opts, or remove the opts.set_level_zero_* calls if they’re not needed).

Suggested change

opts.set_level_zero_file_num_compaction_trigger(8);

opts.set_level_zero_slowdown_writes_trigger(20);

opts.set_level_zero_stop_writes_trigger(36);

greptile-apps · 2026-02-11T18:34:02Z

Greptile Overview

Greptile Summary

This PR tunes RocksDB configuration for the L1 storage backend by (1) reducing write buffer sizes / max memtables for trie and flat key-value column families to reduce memtable skiplist depth and insert overhead, and (2) increasing L0 compaction triggers (global and per-CF) to batch more L0 files per compaction and reduce compaction churn.

Changes are localized to the RocksDB backend’s open() path where global Options and per-column-family Options are constructed before opening the DB, plus a changelog entry documenting the perf motivation.

Confidence Score: 5/5

This PR is safe to merge with minimal risk.
Changes are limited to RocksDB tuning knobs (memtable sizes and compaction triggers) and a changelog entry; no API/logic changes were introduced, and the new values are consistent across global and per-CF options.
crates/storage/backend/rocksdb.rs (monitor for performance/regression under heavy write workloads)

Important Files Changed

Filename	Overview
CHANGELOG.md	Adds a perf changelog entry for 2026-02-11 referencing #6183; no functional impact.
crates/storage/backend/rocksdb.rs	Adjusts RocksDB Options/CF options: reduces write buffers for trie/FKV and raises L0 compaction triggers. No correctness issues found, but behavior change increases write stall thresholds and flush frequency.

Sequence Diagram

sequenceDiagram
    participant App as Ethrex
    participant Backend as RocksDBBackend::open
    participant Rocks as RocksDB

    App->>Backend: open(path)
    Backend->>Backend: Build global Options
    Backend->>Rocks: list_cf(opts, path)
    Rocks-->>Backend: existing CF names
    Backend->>Backend: Union(existing CFs, TABLES)
    loop For each CF
        Backend->>Backend: Build CF Options
        Backend->>Backend: Apply per-CF memtable + L0 trigger tuning
    end
    Backend->>Rocks: open_cf_descriptors(opts, cf_descriptors)
    Rocks-->>Backend: DB handle
    Backend->>Rocks: drop_cf(obsolete CFs)
    Backend-->>App: RocksDBBackend

github-actions · 2026-02-11T19:13:19Z

Benchmark Block Execution Results Comparison Against Main

Command	Mean [s]	Min [s]	Max [s]	Relative
`base`	63.699 ± 0.269	63.264	64.182	1.01 ± 0.01
`head`	63.136 ± 0.175	62.829	63.355	1.00

Copilot AI review requested due to automatic review settings February 11, 2026 18:25

ilitteri requested a review from a team as a code owner February 11, 2026 18:25

ilitteri added performance Block execution throughput and performance in general L1 Ethereum client labels Feb 11, 2026

github-project-automation bot added this to ethrex_l1 and ethrex_performance Feb 11, 2026

github-project-automation bot moved this to Todo in ethrex_performance Feb 11, 2026

ethrex-project-sync bot moved this to In Review in ethrex_l1 Feb 11, 2026

github-actions bot assigned ilitteri Feb 11, 2026

Copilot started reviewing on behalf of ilitteri February 11, 2026 18:26 View session

ilitteri changed the title ~~perf(l1): Reduce trie/FKV memtable size and raise L0 compaction trigger~~ perf(l1): reduce trie/FKV memtable size and raise L0 compaction trigger Feb 11, 2026

Copilot AI reviewed Feb 11, 2026

View reviewed changes

azteca1998 approved these changes Feb 13, 2026

View reviewed changes

ElFantasma approved these changes Feb 13, 2026

View reviewed changes

edg-l approved these changes Feb 17, 2026

View reviewed changes

Merge branch 'main' into perf/rocksdb-memtable-compaction-tuning

a8acfa4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(l1): reduce trie/FKV memtable size and raise L0 compaction trigger#6183

perf(l1): reduce trie/FKV memtable size and raise L0 compaction trigger#6183
ilitteri wants to merge 2 commits intomainfrom
perf/rocksdb-memtable-compaction-tuning

ilitteri commented Feb 11, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

greptile-apps bot commented Feb 11, 2026

Important Files Changed

Uh oh!

github-actions bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	opts.set_level_zero_file_num_compaction_trigger(8);
	opts.set_level_zero_slowdown_writes_trigger(20);
	opts.set_level_zero_stop_writes_trigger(36);

Conversation

ilitteri commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Benchmark Results

Throughput (Mgas/s)

Latency

CPU Profile Comparison — Fast (runs #63 vs #76)

CPU Profile Comparison — Gigablocks (runs #64 vs #77)

CPU Profile Comparison — Slow (runs #65 vs #78)

Analysis

How to Test

Uh oh!

github-actions bot commented Feb 11, 2026

🤖 Kimi Code Review

Review Summary

Issues Found

Recommendations

Uh oh!

github-actions bot commented Feb 11, 2026

🤖 Codex Code Review

Uh oh!

github-actions bot commented Feb 11, 2026

🤖 Claude Code Review

PR Review: Reduce trie/FKV memtable size and raise L0 compaction trigger

Summary

✅ Strengths

🔍 Detailed Review

crates/storage/backend/rocksdb.rs

Lines 41-43: Global L0 compaction triggers (2→8, 10→20, 16→36)

Lines 91-93: Per-CF L0 compaction triggers

Lines 122-123 & 134-135: Trie/FKV memtable reduction (512MB→128MB, 6→4 buffers)

CHANGELOG.md:5-6

🔒 Security & Correctness

⚡ Performance Considerations

📊 Testing Recommendations

🎯 Recommendations

High Priority

Low Priority (Nice to have)

✅ Final Verdict

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

github-actions bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Block Execution Results Comparison Against Main

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ilitteri commented Feb 11, 2026 •

edited

Loading

github-actions bot commented Feb 11, 2026 •

edited

Loading