Skip to content

Conversation

@pcholakov
Copy link
Contributor

@pcholakov pcholakov commented Nov 3, 2025

With this change, Restate can be configured to retain a set of recent snapshots, and automatically delete older snapshots that are no longer needed. Previously, older snapshots were not managed by Restate and users are expected to figure out how to safely clean them up. This change saves users from having to implement an external lifecycle policy that respects the latest snapshot necessary for bootstrap.

When explicit snapshot retention is specified, the reported Archived LSN will be that of the earliest retained snapshot. Together with the durability setting, this influences the automatic log trim behavior. When auto trim respects the Archived LSN, any retained snapshot can be used to bootstrap a partition. For now, falling back to an earlier snapshot requires that the partition's latest.json file is manually updated to point to an earlier snapshot id if necessary, e.g. to deal with corruption.

This change builds on #3918.


Sample minimal configuration file:

[worker]
durability-mode = "snapshot-only"       # use archived LSN as the safe log trim position source for testing on single nodes

[worker.snapshots]
destination = "s3://restate/snapshots"
snapshot-interval-num-records = 1000    # min records per snapshot
snapshot-interval = "5 min"
experimental-retain-snapshots = 10

This configuration means:

  • create a new snapshot every 5 min, but only if at least 1000 new records have been applied since
  • retain the latest 10 snapshots, and consider the earliest of these as the archived LSN
  • automatically delete earlier snapshots from the object store

@github-actions
Copy link

github-actions bot commented Nov 3, 2025

Test Results

  7 files  ±0    7 suites  ±0   3m 25s ⏱️ +48s
 47 tests ±0   47 ✅ ±0  0 💤 ±0  0 ❌ ±0 
200 runs  ±0  200 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit afba6c6. ± Comparison against base commit ba47e39.

♻️ This comment has been updated with latest results.

@pcholakov pcholakov force-pushed the pavel/qnkxqqrvzqyx branch 2 times, most recently from 7559a12 to 2c0667f Compare November 4, 2025 21:55
@pcholakov pcholakov changed the title [WIP] Add support for managing a fixed number of retained snapshots Add support for managing a fixed number of retained snapshots Nov 11, 2025
@pcholakov pcholakov marked this pull request as ready for review November 11, 2025 10:52
Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this PR @pcholakov. I think it will be great improvement for our users no longer having to manage the snapshots themselves.

Before diving into the details, what was the motivation to explicitly keep track of deletions and retained snapshots from the perspective of the latest snapshot? The extra bookkeeping adds a bit of complexity which could not be necessary if we had a simple periodic snapshot cleaner that periodically lists the snapshot repository and deletes everything except for the latest retained snapshots. Did you want to save S3 get calls? Would there be a problem with reporting the archived lsn (or the lsn that no snapshot refers to anymore)? Or is this a preparation for things that will become necessary once we add support for incremental snapshots? Or is the idea that in the future there can be different retention policies which makes us want to track exactly which snapshots to retain and which ones to delete?

Comment on lines +57 to +58
snapshot_lsn = %metadata.min_applied_lsn,
archived_lsn = %archived_lsn.get_archived_lsn(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between snapshot lsn and archived lsn?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I assume that the latter is the snapshot lsn of the earliest retained snapshot.

/// A stream that tracks the last reported durable Lsn, replica-set durable points, and
/// last archived lsn and emits a [`PartitionDurability`] when the durable Lsn changes.
/// A stream that tracks the last reported durable Lsn, replica-set durable points, and archived LSN
/// (from snapshot repository), and emits a [`PartitionDurability`] when the durable lSN changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/lSN/LSN/


pending_snapshots: HashMap<PartitionId, PendingSnapshotTask>,
latest_snapshots: HashMap<PartitionId, SnapshotCreated>,
latest_snapshots: HashMap<PartitionId, LatestSnapshot>, // NB: latest snapshot min LSN != archived LSN, necessarily
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment holds only true if num retained snapshots > 1, right? And even then, snapshots might have the same lsn (admittedly this is a corner case).


pending_snapshots: HashMap<PartitionId, PendingSnapshotTask>,
latest_snapshots: HashMap<PartitionId, SnapshotCreated>,
latest_snapshots: HashMap<PartitionId, LatestSnapshot>, // NB: latest snapshot min LSN != archived LSN, necessarily
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between latest_snapshots and archived_lsns? The latter seems to contain a subset of the information of the former. Could this be unified?

#[default]
V1,
/// V2 adds support for a fixed number of retained snapshots
// todo(v1.7): make this the default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a Github issue with the release-blocker label to help us remember this.

.map(|filename| {
self.get_snapshot_file(&metadata, filename.name.trim_start_matches("/"))
})
.chain(vec![metadata_path].into_iter())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::iter::once might be better than vec!.into_iter()

info!(
%partition_id,
snapshot_id = %snapshot_ref.snapshot_id,
"Failed to clean up old snapshot; repeated failures will impact the ability to create new snapshots"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether failed cleanups should affect future snapshot creations.

use super::{PartitionSnapshotMetadata, SnapshotFormatVersion};

#[restate_core::test]
#[traced_test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's just about enabling logging, then we have a different pattern in #[test_log::test(restate_core::test)] in the code base as well. traced_test does a bit more by giving tests access to the loggging, I believe. Would be great to unify those approaches at some point.

snapshots.push(snapshot);
}

tokio::time::sleep(tokio::time::Duration::from_millis(100)).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this sleep here?

Comment on lines +1630 to +1645
let archived_lsn = repository.get_latest_archived_lsn(PartitionId::MIN).await?;
assert_eq!(archived_lsn.get_archived_lsn(), Lsn::new(1000));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that this is tad bit confusing. When calling get_latest_archived_lsn I kind of would expect to get the archived lsn of the latest snapshot. I do understand that the archived lsn is now something else (like the snapshot lsn of th earliest retained snapshot) and maybe method names should reflect this?

With this change, Restate can be configured to retain a set of recent
snapshots, and automatically delete older snapshots that are no longer
needed. Previously, older snapshots were no longer managed by Retate.
This saves users from having to implement an external lifecycle policy.

When explicit snapshot retention is specified, the reported Archived LSN
will be that of the earliest retained snapshot. Together with the
durability setting, this influences the automatic log trim behavior.
When auto trim respects the Archived LSN, the latest partition snapshot
reference can be (manually) updated to any snapshot within the retention
window, e.g. to deal with snapshot corruption.
@tillrohrmann tillrohrmann linked an issue Nov 17, 2025 that may be closed by this pull request
Base automatically changed from pavel/vowpsqrwurmp to main November 17, 2025 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for managing a fixed number of retained snapshots

3 participants