Skip to content

Add storage command for triggering RocksDB compaction#4385

Merged
tillrohrmann merged 2 commits into
restatedev:mainfrom
david-wagih:feature/testatectl-command-compact-db
Mar 27, 2026
Merged

Add storage command for triggering RocksDB compaction#4385
tillrohrmann merged 2 commits into
restatedev:mainfrom
david-wagih:feature/testatectl-command-compact-db

Conversation

@david-wagih
Copy link
Copy Markdown
Contributor

What is Added

  • Introduced TriggerCompaction RPC in the NodeCtlSvc protocol.
  • Implemented trigger_compaction method in NodeCtlSvcHandler to handle compaction requests.
  • Added Compact command in the Restate CLI for initiating compaction on specified nodes.
  • Created compact.rs to define compaction options and logic.
  • Updated module structure to include storage commands.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 10, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@david-wagih
Copy link
Copy Markdown
Contributor Author

#4238

@david-wagih
Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@david-wagih
Copy link
Copy Markdown
Contributor Author

recheck

@tillrohrmann tillrohrmann self-requested a review February 10, 2026 19:53
@david-wagih
Copy link
Copy Markdown
Contributor Author

@tillrohrmann
Hi Till, just asking for a feedback on this one

@tillrohrmann
Copy link
Copy Markdown
Contributor

Hi @david-wagih, thanks a lot for this contribution. I'll get to review your PR this week. Sorry for the slow response.

@david-wagih
Copy link
Copy Markdown
Contributor Author

@tillrohrmann Hi Till, I am just rechecking with you about the state of this PR

@tillrohrmann
Copy link
Copy Markdown
Contributor

Hi @david-wagih, sorry for the delay. I haven't forgotten it. Will get to it swiftly.

Copy link
Copy Markdown
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for creating this PR @david-wagih. I think it looks good. I left a few comments for how we could improve it. Let me know what you think.

} else if name == "db" || name.starts_with("db-") {
DatabaseKind::PartitionStore
} else {
DatabaseKind::Unknown
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this variant being used for? It looks as if all unknown dbs map to this variant. Since we are translating the existing db names to variants it could then happen that a request for "foobar" triggers a compaction on a newly introduced db for which we forgot to add the translation to DatabaseKind.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Differently asked, why not ignoring requested db to compact which are not known?

}

let cf_count = db.cfs().len() as u32;
db.compact_all().await;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it was intentional to run compactions sequentially to avoid overloading the system with I/O operations. Maybe add a comment to make this explicit (why we are only running a single compaction at a time).

}

let cf_count = db.cfs().len() as u32;
db.compact_all().await;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can extend compact_all to return whether the compaction was successful. Right now, independent of whether the compaction succeeded, we are returning success: true.

Comment on lines +53 to +59
if name == "log-server" {
DatabaseKind::LogServer
} else if name == "replicated-metadata-server" {
DatabaseKind::MetadataServer
} else if name == "local-loglet" {
DatabaseKind::LocalLoglet
} else if name == "db" || name.starts_with("db-") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a chance that this will silently break if the db names should be changed where we open the RocksDB databases.

Comment on lines +287 to +288
let requested_kinds: Vec<DatabaseKind> = if request.databases.is_empty() {
vec![DatabaseKind::All]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an empty request.databases translates into all databases, do we need an explicit DatabaseKind::All variant? It seems that we are mixing the database kinds with more general filter conditions by having a DatabaseKind::All.

/// Database type(s) to compact: all, partition-store, log-server,
/// metadata-server, local-loglet. Defaults to all.
#[arg(long, short = 'd', value_delimiter = ',')]
database: Vec<String>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the Vec type be DatabaseKind? That way we might get automatic type safety and prevent user typos from going unnoticed.

Comment on lines +38 to +40
/// Target specific node addresses (default: all nodes from cluster)
#[arg(long, short = 'n', value_delimiter = ',')]
node: Vec<AdvertisedAddress<FabricPort>>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering whether we want to rather specify NodeId instead of addresses. We can obtain the addresses from the NodesConfiguration. A bit similar to restatectl nodes rm --nodes N1,N2.

let mut client = new_node_ctl_client(channel, &CliContext::get().network);

let request = TriggerCompactionRequest { databases };
let response = tokio::time::timeout(COMPACTION_TIMEOUT, client.trigger_compaction(request))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compaction timeout is hardcoded which can be a problem for very long running compaction calls. Could we use CliContext::get().request_timeout() instead?

} else {
opts.database
.iter()
.filter_map(|s| parse_database_kind(s))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are silently dropping dbs with typos.

@david-wagih
Copy link
Copy Markdown
Contributor Author

@tillrohrmann
thanks Till, for the review. I am checking the comments now and will push a new commit when done

@david-wagih
Copy link
Copy Markdown
Contributor Author

@tillrohrmann
Hello Till, I have implemented the fixes for your comments. Can you please recheck this PR

@tillrohrmann
Copy link
Copy Markdown
Contributor

Will do later today.

@david-wagih
Copy link
Copy Markdown
Contributor Author

@tillrohrmann
Hello Till, I am just rechecking with you

Copy link
Copy Markdown
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @david-wagih. The changes look good to me. I will try your changes out and if everything works, then I will merge your PR. Will also address the only comment I had left. Thanks a lot for your contribution ❤️

Comment thread server/src/signal.rs Outdated
@david-wagih
Copy link
Copy Markdown
Contributor Author

Great work @david-wagih. The changes look good to me. I will try your changes out and if everything works, then I will merge your PR. Will also address the only comment I had left. Thanks a lot for your contribution ❤️

thanks till for your feedback, excited to get my first contribution in restate merged ❤️

@tillrohrmann tillrohrmann force-pushed the feature/testatectl-command-compact-db branch from 2eb9c2d to 30d7bb1 Compare March 27, 2026 10:28
david-wagih and others added 2 commits March 27, 2026 11:56
Address review feedback on storage compact command

- Remove DatabaseKind::All from proto; empty databases list now means
  compact all, keeping DB kind semantics clean
- Rename DATABASE_KIND_UNKNOWN to DATABASE_KIND_UNSPECIFIED (proto3 convention)
- db_name_to_kind returns Option<DatabaseKind> so unrecognized DB names
  are skipped rather than risking accidental compaction of future DBs
- compact_all now returns Result<(), RocksError> so failures are
  propagated into the per-DB success/error fields in the response instead
  of always reporting success: true
- Add comment explaining sequential compaction is intentional to avoid
  I/O overload
- CLI --database field is now Vec<DatabaseKind> with a value_parser so
  clap rejects invalid values at parse time instead of silently dropping them
- CLI --node now accepts PlainNodeId (e.g. N1,N2) instead of raw addresses;
  addresses are resolved from NodesConfiguration, matching the nodes rm pattern
- Replace hardcoded 300s compaction timeout with CliContext::get().request_timeout()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update server/src/signal.rs

Co-authored-by: Till Rohrmann <till.rohrmann@gmail.com>
Move DatabaseKind from node_ctl_svc.proto to common.proto so it is
available in restate-types. Add DatabaseKind::db_name() as the canonical
mapping from kind to RocksDB database name, and store the kind on DbSpec
so that RocksDb::kind() can be queried directly without reverse-
engineering the kind from the name string.

This eliminates the scattered DB_NAME constants from individual crates
and removes the fragile db_name_to_kind() function that could silently
go out of sync if a database was renamed.
@tillrohrmann tillrohrmann force-pushed the feature/testatectl-command-compact-db branch from 30d7bb1 to aa1749d Compare March 27, 2026 10:57
@tillrohrmann tillrohrmann merged commit aa1749d into restatedev:main Mar 27, 2026
1 check passed
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 27, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants