Description
Problem
Right now, SLM treats PARTIAL
and FAILED
snapshots the same, and both are kept around forever. This is unlikely to be the behavior users will expect from SLM, so SLM should handle retention of partial and failed snapshots as well.
Proposed solution
FAILED
snapshots will be kept until the configured expire_after
period has passed, if present, and then be deleted. If there is no configured expire_after
in the retention policy, then they will be deleted if there is at least one more recent successful snapshot from this policy (as they may otherwise be useful for troubleshooting purposes). Failed snapshots are not counted towards either min_count
or max_count
. (This has been implemented in #47617)
PARTIAL
snapshots are more likely to be useful, so need to be handled a bit differently. For this case, there are two potential routes: One that is simple, and one that attempts to be intuitive.
Simple
Partial snapshots are retained unless there is at least one more recent successful snapshot from the same policy, at which point they are deleted after the expire_after
period has passed, if present. If expire_after
is not present and there is a more recent successful snapshot, they are deleted in the next retention run. In this case, partial snapshots are not counted toward either min_count
or max_count
, which count successful snapshots only. (This has been implemented in #47833)
Complex
- If
min_count
is the only condition: No snapshots for this policy are ever deleted, so partial snapshots have no special handling. - If
expire_after
is the only condition: At least one successful snapshot will be kept, regardless ofexpire_after
. Partial snapshots are deleted after theexpire_after
period has passed, regardless of whether or not there is a more recent successful snapshot. - If
max_count
is the only condition: At least one successful snapshot will be kept. Partial snapshots are deleted, oldest first, to keepsuccessful_snaps + partial_snaps
equal to or less thanmax_count
. - If
min_count
andexpire_after
are configured: At leastmin_count
successful snapshots will be retained. Partial snapshots are deleted after theexpire_after
period has passed, regardless of whether or not there is a more recent successful snapshot. - If
min_count
andmax_count
are configured: At leastmin_count
successful snapshots will be retained. All other snapshots, whether successful or partial, will be deleted, oldest first, to keepsuccessful_snaps + partial_snaps
equal to or less thanmax_count
. - If
expire_after
andmax_count
are configured: At least one successful snapshot will be kept, regardless ofexpire_after
. Partial snapshots will be deleted, oldest first, to keepsuccessful_snaps + partial_snaps
equal to or less thanmax_count
, as well as after theexpire_after
period, regardless of whether there is a more recent successful snapshot. - If all three conditions are configured: At least
min_count
successful snapshots will be retained. All other snapshots, whether successful or partial, will be deleted, oldest first, to keepsuccessful_snaps + partial_snaps
equal to or less thanmax_count
, as well as after theexpire_after
period has passed, regardless of whether there is a more recent successful snapshot.
Relates to #43663