Closed
Description
From https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+periodic/583/console & https://gradle-enterprise.elastic.co/s/6x67ha6426acy/console-log
2> REPRODUCE WITH: ./gradlew ':x-pack:plugin:ilm:test' --tests "org.elasticsearch.xpack.slm.SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress" -Dtests.seed=7BA427BA999CD99D -Dtests.security.manager=true -Dtests.locale=fr-GP -Dtests.timezone=America/Edmonton -Dcompiler.java=12 -Druntime.java=11
2> java.lang.AssertionError
at __randomizedtesting.SeedInfo.seed([7BA427BA999CD99D:67E39A043E8F736]:0)
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertNotNull(Assert.java:712)
at org.junit.Assert.assertNotNull(Assert.java:722)
at org.elasticsearch.xpack.slm.SLMSnapshotBlockingIntegTests.lambda$testRetentionWhileSnapshotInProgress$2(SLMSnapshotBlockingIntegTests.java:153)
at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:866)
at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:840)
at org.elasticsearch.xpack.slm.SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress(SLMSnapshotBlockingIntegTests.java:146)
Likely from this exception when trying to kick off the second snapshot:
1> [2019-09-09T13:42:03,563][INFO ][o.e.x.s.SLMSnapshotBlockingIntegTests] [testRetentionWhileSnapshotInProgress] --> waiting for snapshot snap-qdwsdayhtfuymbsj7vi2yw to be completed, got: STARTED
1> [2019-09-09T13:42:03,821][INFO ][o.e.x.s.SLMSnapshotBlockingIntegTests] [testRetentionWhileSnapshotInProgress] --> waiting for snapshot snap-qdwsdayhtfuymbsj7vi2yw to be completed, got: SUCCESS
1> [2019-09-09T13:42:03,821][INFO ][o.e.x.s.SLMSnapshotBlockingIntegTests] [testRetentionWhileSnapshotInProgress] --> blocking nodes from completing snapshot
1> [2019-09-09T13:42:03,822][INFO ][o.e.x.s.SnapshotLifecycleTask] [node_s0] snapshot lifecycle policy [slm-policy] issuing create snapshot [snap-frash4insd-kptw8sm1rew]
1> [2019-09-09T13:42:03,824][INFO ][o.e.x.s.SLMSnapshotBlockingIntegTests] [testRetentionWhileSnapshotInProgress] --> checking for in progress snapshot...
1> [2019-09-09T13:42:03,826][INFO ][o.e.x.s.SLMSnapshotBlockingIntegTests] [testRetentionWhileSnapshotInProgress] --> checking for in progress snapshot...
1> [2019-09-09T13:42:03,828][WARN ][o.e.s.SnapshotsService ] [node_s0] [slm-repo][snap-frash4insd-kptw8sm1rew] failed to create snapshot
1> org.elasticsearch.snapshots.ConcurrentSnapshotExecutionException: [slm-repo:snap-frash4insd-kptw8sm1rew] a snapshot is already running
1> at org.elasticsearch.snapshots.SnapshotsService$1.execute(SnapshotsService.java:301) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:697) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:319) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:214) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:699) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
1> at java.lang.Thread.run(Thread.java:834) [?:?]
My hunch is that the first snapshot has a "SUCCESS" status, but is still present in the cluster state. We should ensure it's no longer present in the cluster state before issuing the second execute policy request.