Skip to content

[CI] SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress failure on master #46508

Closed
@dakrone

Description

@dakrone

From https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+periodic/583/console & https://gradle-enterprise.elastic.co/s/6x67ha6426acy/console-log

  2> REPRODUCE WITH: ./gradlew ':x-pack:plugin:ilm:test' --tests "org.elasticsearch.xpack.slm.SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress" -Dtests.seed=7BA427BA999CD99D -Dtests.security.manager=true -Dtests.locale=fr-GP -Dtests.timezone=America/Edmonton -Dcompiler.java=12 -Druntime.java=11
  2> java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([7BA427BA999CD99D:67E39A043E8F736]:0)
        at org.junit.Assert.fail(Assert.java:86)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.junit.Assert.assertNotNull(Assert.java:712)
        at org.junit.Assert.assertNotNull(Assert.java:722)
        at org.elasticsearch.xpack.slm.SLMSnapshotBlockingIntegTests.lambda$testRetentionWhileSnapshotInProgress$2(SLMSnapshotBlockingIntegTests.java:153)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:866)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:840)
        at org.elasticsearch.xpack.slm.SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress(SLMSnapshotBlockingIntegTests.java:146)

Likely from this exception when trying to kick off the second snapshot:

  1> [2019-09-09T13:42:03,563][INFO ][o.e.x.s.SLMSnapshotBlockingIntegTests] [testRetentionWhileSnapshotInProgress] --> waiting for snapshot snap-qdwsdayhtfuymbsj7vi2yw to be completed, got: STARTED
  1> [2019-09-09T13:42:03,821][INFO ][o.e.x.s.SLMSnapshotBlockingIntegTests] [testRetentionWhileSnapshotInProgress] --> waiting for snapshot snap-qdwsdayhtfuymbsj7vi2yw to be completed, got: SUCCESS
  1> [2019-09-09T13:42:03,821][INFO ][o.e.x.s.SLMSnapshotBlockingIntegTests] [testRetentionWhileSnapshotInProgress] --> blocking nodes from completing snapshot
  1> [2019-09-09T13:42:03,822][INFO ][o.e.x.s.SnapshotLifecycleTask] [node_s0] snapshot lifecycle policy [slm-policy] issuing create snapshot [snap-frash4insd-kptw8sm1rew]
  1> [2019-09-09T13:42:03,824][INFO ][o.e.x.s.SLMSnapshotBlockingIntegTests] [testRetentionWhileSnapshotInProgress] --> checking for in progress snapshot...
  1> [2019-09-09T13:42:03,826][INFO ][o.e.x.s.SLMSnapshotBlockingIntegTests] [testRetentionWhileSnapshotInProgress] --> checking for in progress snapshot...
  1> [2019-09-09T13:42:03,828][WARN ][o.e.s.SnapshotsService   ] [node_s0] [slm-repo][snap-frash4insd-kptw8sm1rew] failed to create snapshot
  1> org.elasticsearch.snapshots.ConcurrentSnapshotExecutionException: [slm-repo:snap-frash4insd-kptw8sm1rew]  a snapshot is already running
  1> 	at org.elasticsearch.snapshots.SnapshotsService$1.execute(SnapshotsService.java:301) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:697) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:319) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:214) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:699) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
  1> 	at java.lang.Thread.run(Thread.java:834) [?:?]

My hunch is that the first snapshot has a "SUCCESS" status, but is still present in the cluster state. We should ensure it's no longer present in the cluster state before issuing the second execute policy request.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions