Switch RocksDB block cache from LRU to HyperClockCache by AhmedSoliman · Pull Request #4473 · restatedev/restate

AhmedSoliman · 2026-03-09T14:58:21Z

HyperClockCache (HCC) is the recommended default for RocksDB, offering
better scalability under concurrent access compared to LRU cache.

Stack created with Sapling. Best reviewed with ReviewStack.

github-actions · 2026-03-09T15:27:03Z

Test Results

5 files ±0 5 suites ±0 1m 6s ⏱️ -11s
34 tests ±0 34 ✅ ±0 0 💤 ±0 0 ❌ ±0
52 runs ±0 52 ✅ ±0 0 💤 ±0 0 ❌ ±0

Results for commit c6556c1. ± Comparison against base commit b07a875.

github-actions · 2026-03-09T15:27:04Z

Test Results

7 files ± 0 7 suites ±0 4m 54s ⏱️ + 2m 22s
49 tests + 2 49 ✅ + 2 0 💤 ±0 0 ❌ ±0
210 runs +10 210 ✅ +10 0 💤 ±0 0 ❌ ±0

Results for commit 39c89b4. ± Comparison against base commit efdb162.

♻️ This comment has been updated with latest results.

Extract two generic, reusable utilities into restate-futures-util: **monotonic_token**: A lightweight mechanism for a producer to signal completion of a prefix of sequentially issued work items. Provides Token<T>, TokenOwner<T>, Tokens<T>, and TokenListener<T> types with a phantom type parameter to prevent mixing tokens from different domains. Uses atomics (Relaxed/Release/Acquire) for lock-free operation — no RwLock or watch overhead. **waiter_queue**: A priority-drainable queue (WaiterQueue<K, V>) designed for the common case where entries arrive in key-order. Uses an adaptive strategy: push_back for in-order inserts (O(1)), binary-search insert for out-of-order (rare). Drain is always a simple front-pop. Includes a Criterion benchmark comparing four strategies (naive, compact, adaptive, sorted-insert). Both modules include comprehensive documentation and tests. Neither references any specific use-case — they are general-purpose building blocks.

…ailures

…closed This makes turning off loglet workers cleaner (next PR).

- Priority-queue based writer allowing seal messages to jump the queue - Deduplication of seal messages and store messages - Improved metrics for the write path (counting bytes, stores, and store status) - Loglet workers will shutdown when quiescent and release resources - Writer task limits the batch based on the memtable size as a reasonable guidance and removing the need for `write-batch-commit-count` config. - Removed the returned WriteBatch in the error case since write errors are terminal. This reduces the size of the returned Result.

Replace raw usize/NonZeroUsize types with type-safe NonZeroByteCount for all RocksDB memory budget configurations across the codebase. Key changes: - CommonOptions: make rocksdb_total_memory_size private behind a getter that enforces a 256 MiB minimum; rename rocksdb_actual_total_memtables_size to rocksdb_total_memtables_size with a 32 MiB floor; remove the 5% safety margin (rocksdb_safe_total_memtables_size); clamp memtables ratio to [0.1, 1.0] instead of [0.0, 1.0] - LogServerOptions: remove data_service_memory_limit config (memory pool capacity is now derived from rocksdb_data_memtables_budget); fix metadata memtables budget to a constant 8 MiB instead of a ratio; enforce 40 MiB (32 MiB data + 8 MiB metadata) minimum for log-server memory budget - MetadataServerOptions/StorageOptions: change rocksdb_memory_budget return types from usize to NonZeroByteCount with per-component minimums - ByteCount: add arithmetic ops (Add, Mul, saturating_add/mul), Default, and TryFrom<u64> for NonZeroByteCount - Remove unnecessary runtime assertions that were checking for non-zero on already non-zero types

Move db-level properties (is-write-stopped, background-errors, num-running-compactions, actual-delayed-write-rate) from the per-CF set to the per-DB set since they are database-wide. Also fix the unit of actual-delayed-write-rate to Bytes and add blob-db metrics (live-blob-file-size, live-blob-file-garbage-size) and obsolete-sst-files-size for log-server observability.

…ion budgets Remove the shared rocksdb_max_background_jobs config (which gave every database CPU_COUNT background jobs) and replace it with role-aware per-database budgets for flushes and compactions. Flushes (latency-critical) are split equally across databases while compactions (throughput-heavy) are weighted ~65% toward the partition-store. Metadata-server and local-loglet get a fixed budget of 1+1. Also adds worker.snapshots.export-concurrency-limit (default 4) to replace the snapshot export concurrency that was previously derived from max_background_jobs.

Add two new options to RocksDbOptions: - rocksdb-disable-wal-compression: disables Zstd WAL compression (default: false) - rocksdb-disable-l0-l1-compression: disables Zstd L0/L1 SST compression (default: false) Both options cascade from common rocksdb config and default to compression enabled, preserving existing behavior. A new build_compression_per_level() helper in the rocksdb crate constructs per-level compression arrays.

HyperClockCache (HCC) is the recommended default for RocksDB, offering better scalability under concurrent access compared to LRU cache.

AhmedSoliman mentioned this pull request Mar 9, 2026

[LogServer] Rework RocksDB tuning for log-server workload #4474

Open

AhmedSoliman added 10 commits March 9, 2026 20:33

[Bifrost] Makes changing recovery preference non-async

bc48859

[Bifrost][Minor] Improves logging and reporting of status on append f…

1be8a60

…ailures

[LogServer][1/N] Respond with SortCodeNotFound when loglet worker is …

4e5c59b

…closed This makes turning off loglet workers cleaner (next PR).

Switch RocksDB block cache from LRU to HyperClockCache

39c89b4

HyperClockCache (HCC) is the recommended default for RocksDB, offering better scalability under concurrent access compared to LRU cache.

AhmedSoliman force-pushed the pr4473 branch from c6556c1 to 39c89b4 Compare March 9, 2026 20:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch RocksDB block cache from LRU to HyperClockCache#4473

Switch RocksDB block cache from LRU to HyperClockCache#4473
AhmedSoliman wants to merge 10 commits intomainfrom
pr4473

AhmedSoliman commented Mar 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

github-actions bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AhmedSoliman commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 9, 2026

Test Results

Uh oh!

github-actions bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AhmedSoliman commented Mar 9, 2026 •

edited

Loading

github-actions bot commented Mar 9, 2026 •

edited

Loading