Closed
Description
Elasticsearch Version
7.x and 8.x
Installed Plugins
No response
Java Version
bundled
OS Version
All
Problem Description
We retrieve all snapshots for all repositories at once, however, if we fail to retrieve them, we consider the SLM task as failed entirely, even though we might have been able to get snapshots individually. We should probably do a pre-check for the repository existence in the cluster state and then filter any missing repositories out when we retrieve snapshots for SLM retention execution.
Steps to Reproduce
Run ES with ./gradlew run -Dtests.es.path.repo=/tmp
, then:
PUT /_cluster/settings
{
"transient": {
"logger.org.elasticsearch.xpack.slm":"TRACE"
}
}
PUT /_snapshot/repo
{
"type": "fs",
"settings": {
"location": "/tmp/foo"
}
}
PUT /_snapshot/missing
{
"type": "fs",
"settings": {
"location": "/tmp/foo2"
}
}
PUT /_slm/policy/daily-snapshots
{
"schedule": "0 30 1 * * ?",
"name": "<daily-snap-{now/d}>",
"repository": "repo",
"config": {
"ignore_unavailable": false,
"include_global_state": false
},
"retention": {
"expire_after": "1s"
}
}
PUT /_slm/policy/daily-snapshots2
{
"schedule": "0 30 1 * * ?",
"name": "<daily-snap-{now/d}>",
"repository": "missing",
"config": {
"ignore_unavailable": false,
"include_global_state": false
},
"retention": {
"expire_after": "1s"
}
}
DELETE /_snapshot/missing
GET /_slm/policy
PUT /_slm/policy/daily-snapshots/_execute
POST /_slm/_execute_retention
Logs (if relevant)
The failure in the logs will look like:
[2023-01-11T13:47:27,594][INFO ][o.e.x.s.a.TransportExecuteSnapshotRetentionAction] [runTask-0] manually triggering SLM snapshot retention
[2023-01-11T13:47:27,595][INFO ][o.e.x.s.SnapshotRetentionTask] [runTask-0] starting SLM retention snapshot cleanup task
[2023-01-11T13:47:27,596][TRACE][o.e.x.s.SnapshotRetentionTask] [runTask-0] policies with retention enabled: [daily-snapshots, daily-snapshots2]
[2023-01-11T13:47:27,596][TRACE][o.e.x.s.SnapshotRetentionTask] [runTask-0] fetching snapshots from repositories: [repo, missing]
[2023-01-11T13:47:27,599][DEBUG][o.e.x.s.SnapshotRetentionTask] [runTask-0] unable to retrieve snapshots for [[repo, missing]] repositories org.elasticsearch.repositories.RepositoryMissingException: [missing] missing
at org.elasticsearch.server@8.7.0-SNAPSHOT/org.elasticsearch.action.admin.cluster.repositories.get.TransportGetRepositoriesAction.getRepositories(TransportGetRepositoriesAction.java:105)
at org.elasticsearch.server@8.7.0-SNAPSHOT/org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:116)
at org.elasticsearch.server@8.7.0-SNAPSHOT/org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:67)
at org.elasticsearch.server@8.7.0-SNAPSHOT/org.elasticsearch.action.support.master.TransportMasterNodeAction.executeMasterOperation(TransportMasterNodeAction.java:124)
at org.elasticsearch.server@8.7.0-SNAPSHOT/org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.lambda$doStart$3(TransportMasterNodeAction.java:235)
at org.elasticsearch.server@8.7.0-SNAPSHOT/org.elasticsearch.action.ActionRunnable$3.doRun(ActionRunnable.java:72)
at org.elasticsearch.server@8.7.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:958)
at org.elasticsearch.server@8.7.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
[2023-01-11T13:47:27,603][ERROR][o.e.x.s.SnapshotRetentionTask] [runTask-0] error during snapshot retention task org.elasticsearch.repositories.RepositoryMissingException: [missing] missing
at org.elasticsearch.server@8.7.0-SNAPSHOT/org.elasticsearch.action.admin.cluster.repositories.get.TransportGetRepositoriesAction.getRepositories(TransportGetRepositoriesAction.java:105)
at org.elasticsearch.server@8.7.0-SNAPSHOT/org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:116)
at org.elasticsearch.server@8.7.0-SNAPSHOT/org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:67)
at org.elasticsearch.server@8.7.0-SNAPSHOT/org.elasticsearch.action.support.master.TransportMasterNodeAction.executeMasterOperation(TransportMasterNodeAction.java:124)
at org.elasticsearch.server@8.7.0-SNAPSHOT/org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.lambda$doStart$3(TransportMasterNodeAction.java:235)
at org.elasticsearch.server@8.7.0-SNAPSHOT/org.elasticsearch.action.ActionRunnable$3.doRun(ActionRunnable.java:72)
at org.elasticsearch.server@8.7.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:958)
at org.elasticsearch.server@8.7.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)