Skip to content

Handle retention of failed and partial snapshots in SLM #46988

Closed
@gwbrown

Description

@gwbrown

Problem

Right now, SLM treats PARTIAL and FAILED snapshots the same, and both are kept around forever. This is unlikely to be the behavior users will expect from SLM, so SLM should handle retention of partial and failed snapshots as well.

Proposed solution

FAILED snapshots will be kept until the configured expire_after period has passed, if present, and then be deleted. If there is no configured expire_after in the retention policy, then they will be deleted if there is at least one more recent successful snapshot from this policy (as they may otherwise be useful for troubleshooting purposes). Failed snapshots are not counted towards either min_count or max_count. (This has been implemented in #47617)

PARTIAL snapshots are more likely to be useful, so need to be handled a bit differently. For this case, there are two potential routes: One that is simple, and one that attempts to be intuitive.

Simple

Partial snapshots are retained unless there is at least one more recent successful snapshot from the same policy, at which point they are deleted after the expire_after period has passed, if present. If expire_after is not present and there is a more recent successful snapshot, they are deleted in the next retention run. In this case, partial snapshots are not counted toward either min_count or max_count, which count successful snapshots only. (This has been implemented in #47833)

Complex

  • If min_count is the only condition: No snapshots for this policy are ever deleted, so partial snapshots have no special handling.
  • If expire_after is the only condition: At least one successful snapshot will be kept, regardless of expire_after. Partial snapshots are deleted after the expire_after period has passed, regardless of whether or not there is a more recent successful snapshot.
  • If max_count is the only condition: At least one successful snapshot will be kept. Partial snapshots are deleted, oldest first, to keep successful_snaps + partial_snaps equal to or less than max_count.
  • If min_count and expire_after are configured: At least min_count successful snapshots will be retained. Partial snapshots are deleted after the expire_after period has passed, regardless of whether or not there is a more recent successful snapshot.
  • If min_count and max_count are configured: At least min_count successful snapshots will be retained. All other snapshots, whether successful or partial, will be deleted, oldest first, to keep successful_snaps + partial_snaps equal to or less than max_count.
  • If expire_after and max_count are configured: At least one successful snapshot will be kept, regardless of expire_after. Partial snapshots will be deleted, oldest first, to keep successful_snaps + partial_snaps equal to or less than max_count, as well as after the expire_after period, regardless of whether there is a more recent successful snapshot.
  • If all three conditions are configured: At least min_count successful snapshots will be retained. All other snapshots, whether successful or partial, will be deleted, oldest first, to keep successful_snaps + partial_snaps equal to or less than max_count, as well as after the expire_after period has passed, regardless of whether there is a more recent successful snapshot.

Relates to #43663

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions