-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Manage retention of failed snapshots in SLM #47617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Failed snapshots will eventually build up unless they are deleted. While failures may not take up much space, they add noise to the list of snapshots and it's desirable to remove them when they are no longer useful. With this change, failed snapshots are deleted using the following strategy: `FAILED` snapshots will be kept until the configured `expire_after` period has passed, if present, and then be deleted. If there is no configured `expire_after` in the retention policy, then they will be deleted if there is at least one more recent successful snapshot from this policy (as they may otherwise be useful for troubleshooting purposes). Failed snapshots are not counted towards either `min_count` or `max_count`.
Pinging @elastic/es-core-features (:Core/Features/ILM) |
I tried to write an integration test for this, but had a heck of a time getting a failed snapshot that actually ended up in the repository. I'll look a bit more to see if there's a way I'm missing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this Gordon, I left some comments. In terms of an integration test, I think you might be able to use MockRepository
to "block" or cause errors during the snapshot the way that SLMSnapshotBlockingIntegTests
does?
...ugin/core/src/main/java/org/elasticsearch/xpack/core/slm/SnapshotRetentionConfiguration.java
Outdated
Show resolved
Hide resolved
...ugin/core/src/main/java/org/elasticsearch/xpack/core/slm/SnapshotRetentionConfiguration.java
Show resolved
Hide resolved
...ugin/core/src/main/java/org/elasticsearch/xpack/core/slm/SnapshotRetentionConfiguration.java
Outdated
Show resolved
Hide resolved
...ugin/core/src/main/java/org/elasticsearch/xpack/core/slm/SnapshotRetentionConfiguration.java
Outdated
Show resolved
Hide resolved
...ugin/core/src/main/java/org/elasticsearch/xpack/core/slm/SnapshotRetentionConfiguration.java
Outdated
Show resolved
Hide resolved
I've pushed most of the changes but would still like to get an integration test in - I believe I've found a way - so you can hold off on re-reviewing until I get that in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for splitting some of the logic, it was easier to follow this time around.
|
||
logger.info("--> start snapshot"); | ||
ActionFuture<ExecuteSnapshotLifecycleAction.Response> snapshotFuture = client() | ||
.execute(ExecuteSnapshotLifecycleAction.INSTANCE, new ExecuteSnapshotLifecycleAction.Request(policyId)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an executePolicy
helper that returns the snapshot name as a String (for future tests)
@elasticmachine run elasticsearch-ci/1 |
Failed snapshots will eventually build up unless they are deleted. While failures may not take up much space, they add noise to the list of snapshots and it's desirable to remove them when they are no longer useful. With this change, failed snapshots are deleted using the following strategy: `FAILED` snapshots will be kept until the configured `expire_after` period has passed, if present, and then be deleted. If there is no configured `expire_after` in the retention policy, then they will be deleted if there is at least one more recent successful snapshot from this policy (as they may otherwise be useful for troubleshooting purposes). Failed snapshots are not counted towards either `min_count` or `max_count`.
Failed snapshots will eventually build up unless they are deleted. While failures may not take up much space, they add noise to the list of snapshots and it's desirable to remove them when they are no longer useful. With this change, failed snapshots are deleted using the following strategy: `FAILED` snapshots will be kept until the configured `expire_after` period has passed, if present, and then be deleted. If there is no configured `expire_after` in the retention policy, then they will be deleted if there is at least one more recent successful snapshot from this policy (as they may otherwise be useful for troubleshooting purposes). Failed snapshots are not counted towards either `min_count` or `max_count`.
Failed snapshots will eventually build up unless they are deleted. While
failures may not take up much space, they add noise to the list of
snapshots and it's desirable to remove them when they are no longer
useful.
With this change, failed snapshots are deleted using the following
strategy:
FAILED
snapshots will be kept until the configuredexpire_after
period has passed, if present, and then be deleted. Ifthere is no configured
expire_after
in the retention policy, then theywill be deleted if there is at least one more recent successful snapshot
from this policy (as they may otherwise be useful for troubleshooting
purposes). Failed snapshots are not counted towards either
min_count
or
max_count
.Implements part of #46988
Labelled
non-issue
because this feature hasn't yet shipped.