Fix Concurrent Snapshot Create+Delete + Delete Index #61770

original-brownbear · 2020-09-01T08:41:14Z

We had a bug here that is new to 7.9 were we put a null value into the shard
assignment mapping when reassigning work after a snapshot delete
had gone through. This only affects partial snaphots but essentially
dead-locks the snapshot process.

Closes #61762

We had a bug here were we put a `null` value into the shard assignment mapping when reassigning work after a snapshot delete had gone through. This only affects partial snaphots but essentially dead-locks the snapshot process. Closes elastic#61762

elasticmachine · 2020-09-01T08:41:16Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

original-brownbear · 2020-09-01T08:57:31Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java

@@ -1715,8 +1723,7 @@ public static ClusterState updateWithSnapshots(ClusterState state,
            IndexMetadata indexMetadata = metadata.index(indexName);
            if (indexMetadata == null) {
                // The index was deleted before we managed to start the snapshot - mark it as missing.
-                builder.put(new ShardId(indexName, IndexMetadata.INDEX_UUID_NA_VALUE, 0),
-                    new SnapshotsInProgress.ShardSnapshotStatus(null, ShardState.MISSING, "missing index", null));
+                builder.put(new ShardId(indexName, IndexMetadata.INDEX_UUID_NA_VALUE, 0), ShardSnapshotStatus.MISSING);


Before concurrent snapshots this spot would cover all possible scenarios because we'd only be dealing with shard ids for indices that still exist in the repo ever beyond this point. If an index was deleted after assignment then it would just fail in the SnapshotShardsService and things would work out that way.
But with concurrent snapshots where we could have indices deleted from under a queued up shard snapshot we have to explicitly deal with this situation.

ywelsch

LGTM

original-brownbear · 2020-09-01T10:28:04Z

Thanks Yannick!

We had a bug here were we put a `null` value into the shard assignment mapping when reassigning work after a snapshot delete had gone through. This only affects partial snaphots but essentially dead-locks the snapshot process. Closes elastic#61762

We had a bug here were we put a `null` value into the shard assignment mapping when reassigning work after a snapshot delete had gone through. This only affects partial snaphots but essentially dead-locks the snapshot process. Closes #61762

DaveCTurner · 2021-03-02T08:01:11Z

Linking this to #56911 so that I can find it again in future.

original-brownbear added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.10.0 v7.9.1 labels Sep 1, 2020

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Sep 1, 2020

original-brownbear commented Sep 1, 2020

View reviewed changes

original-brownbear requested review from ywelsch, tlrx and fcofdez September 1, 2020 09:22

ywelsch approved these changes Sep 1, 2020

View reviewed changes

original-brownbear merged commit ca00a6f into elastic:master Sep 1, 2020

original-brownbear deleted the 61762 branch September 1, 2020 10:28

original-brownbear mentioned this pull request Sep 1, 2020

Fix Concurrent Snapshot Create+Delete + Delete Index (#61770) #61773

Merged

original-brownbear mentioned this pull request Sep 1, 2020

Fix Concurrent Snapshot Create+Delete + Delete Index (#61770) #61774

Merged

original-brownbear restored the 61762 branch December 6, 2020 18:53

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Concurrent Snapshot Create+Delete + Delete Index #61770

Fix Concurrent Snapshot Create+Delete + Delete Index #61770

Uh oh!

original-brownbear commented Sep 1, 2020

Uh oh!

elasticmachine commented Sep 1, 2020

Uh oh!

original-brownbear Sep 1, 2020

Uh oh!

ywelsch left a comment

Uh oh!

original-brownbear commented Sep 1, 2020

Uh oh!

DaveCTurner commented Mar 2, 2021

Uh oh!

Uh oh!

Fix Concurrent Snapshot Create+Delete + Delete Index #61770

Fix Concurrent Snapshot Create+Delete + Delete Index #61770

Uh oh!

Conversation

original-brownbear commented Sep 1, 2020

Uh oh!

elasticmachine commented Sep 1, 2020

Uh oh!

original-brownbear Sep 1, 2020

Choose a reason for hiding this comment

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Sep 1, 2020

Uh oh!

DaveCTurner commented Mar 2, 2021

Uh oh!

Uh oh!