Omit writing index metadata for non-replicated closed indices on data-only node #47285

ywelsch · 2019-09-30T09:52:26Z

Fixes a bug related to how "closed replicated indices" (introduced in 7.2) interact with the index metadata storage mechanism, which has special handling for closed indices (but incorrectly handles replicated closed indices). On non-master-eligible data nodes, it's possible for the node's manifest file (which tracks the relevant metadata state that the node should persist) to become out of sync with what's actually stored on disk, leading to an inconsistency that is then detected at startup, refusing for the node to start up.

The solution used here is to remove the code that treats closed indices specially. This code has not aged well, and its use is dubious as best.

Closes #47276

elasticmachine · 2019-09-30T09:52:27Z

Pinging @elastic/es-distributed

DaveCTurner

LGTM - I left an optional suggestion.

DaveCTurner · 2019-09-30T10:08:19Z

server/src/main/java/org/elasticsearch/gateway/IncrementalClusterStateWriter.java

@@ -244,10 +229,10 @@ private long writeGlobalState(AtomicClusterStateWriter writer, MetaData newMetaD
    }

    // exposed for tests
-    static Set<Index> getRelevantIndices(ClusterState state, ClusterState previousState, Set<Index> previouslyWrittenIndices) {
+    static Set<Index> getRelevantIndices(ClusterState state) {
        Set<Index> relevantIndices;
        if (isDataOnlyNode(state)) {


Seeing as how we're touching this, if we checked state.nodes().getLocalNode().isMasterNode() first then we could say merely else if (state.nodes().getLocalNode(). isDataOnlyNode()) and ditch the isDataOnlyNode method.

And there's no need for the intermediate relevantIndices, just return the thing already :)

DaveCTurner · 2019-09-30T10:09:39Z

server/src/main/java/org/elasticsearch/gateway/IncrementalClusterStateWriter.java

@@ -207,8 +207,7 @@ private long writeGlobalState(AtomicClusterStateWriter writer, MetaData newMetaD
        return actions;
    }

-    private static Set<Index> getRelevantIndicesOnDataOnlyNode(ClusterState state, ClusterState previousState, Set<Index>
-        previouslyWrittenIndices) {
+    private static Set<Index> getRelevantIndicesOnDataOnlyNode(ClusterState state) {


tlrx

LGTM - I left some comments but feel free to address them or not

server/src/test/java/org/elasticsearch/gateway/IncrementalClusterStateWriterTests.java

tlrx · 2019-09-30T10:14:43Z

server/src/test/java/org/elasticsearch/gateway/IncrementalClusterStateWriterTests.java

-            clusterStateWithJustOpenedIndex(indexMetaData, false),
-            clusterStateWithClosedIndex(indexMetaData, false),
-            Collections.emptySet());
+            clusterStateWithJustOpenedIndex(indexMetaData, false));
        assertThat(indices.size(), equalTo(0));
    }

    public void testGetRelevantIndicesForWasClosedPrevWrittenIndexOnDataOnlyNode() {
        IndexMetaData indexMetaData = createIndexMetaData("test");


This test is now a duplicate of the previous one, and I think that PrevNotWritten/PrevWritten becomes misleading now getRelevantIndicesOnDataOnlyNode is changed. I'd prefer to have testGetRelevantIndicesForClosedIndexOnDataOnlyNode testing a closed index replicated/not yet replicated

makes sense, updated

tlrx · 2019-09-30T10:16:11Z

server/src/test/java/org/elasticsearch/indices/state/CloseIndexIT.java

+            .build());
+        indexRandom(randomBoolean(), randomBoolean(), randomBoolean(), IntStream.range(0, randomIntBetween(0, 50))
+            .mapToObj(n -> client().prepareIndex(indexName, "_doc").setSource("num", n)).collect(toList()));
+        assertAcked(client().admin().indices().prepareClose(indexName));


Don't forget the waitForActiveShards when backporting :) (I always forgot it)

haha, good one :)

DaveCTurner · 2019-09-30T10:29:41Z

For posterity, we discussed the various implications of this change. It means that we stop updating the on-disk index metadata for shards of closed non-replicated indices on master-ineligible data nodes. (Master-eligible nodes persist the metadata of all indices in the cluster state, and replicated closed indices appear in the routing table).

The index metadata stored on the disk of a master-ineligible data node is read when the node starts up and also when importing a dangling index. The metadata is read at startup so we can fail the node immediately if it has some invalid/unreadable index metadata but otherwise this has no effect since it is superseded by fresher metadata from the master at join time. It's pretty weak as a corruption check (it doesn't detect non-metadata corruption) and doesn't help much in terms of ensuring the node is compatible with all the indices currently in the cluster, so we're ok with losing that.

When importing a dangling index we already have no freshness guarantees. Furthermore the shard data of a closed index cannot change while that index is closed or unassigned, so we do not need to keep the metadata up to date in order to be sure we will be able to import it as a dangling index in the future. With this change there'll be more chance that a dangling index import will see older mappings or settings, but they will never be so old that the shard is unreadable.

…-only node (#47285) Fixes a bug related to how "closed replicated indices" (introduced in 7.2) interact with the index metadata storage mechanism, which has special handling for closed indices (but incorrectly handles replicated closed indices). On non-master-eligible data nodes, it's possible for the node's manifest file (which tracks the relevant metadata state that the node should persist) to become out of sync with what's actually stored on disk, leading to an inconsistency that is then detected at startup, refusing for the node to start up. Closes #47276

ywelsch added 2 commits September 30, 2019 11:46

Omit writing index metadata for closed indices on data-only node

ad77fd9

simplify test

e648954

ywelsch added >bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.5.0 v7.4.1 labels Sep 30, 2019

ywelsch requested review from tlrx and DaveCTurner September 30, 2019 09:52

ywelsch changed the title ~~Omit writing index metadata for closed indices on data-only node~~ Omit writing index metadata for non-replicated closed indices on data-only node Sep 30, 2019

DaveCTurner approved these changes Sep 30, 2019

View reviewed changes

tlrx approved these changes Sep 30, 2019

View reviewed changes

ywelsch added 2 commits September 30, 2019 12:20

Remove isDataOnlyNode

12a7e09

less newlines

d196810

ywelsch added 2 commits September 30, 2019 12:32

improve tests

800d657

quicke returns

55ed4fb

ywelsch merged commit 38f0221 into elastic:master Sep 30, 2019

gwbrown mentioned this pull request Sep 30, 2019

CloseIndexIT.testRelocatedClosedIndexIssue failures on CI #47330

Closed

DaveCTurner mentioned this pull request Oct 2, 2019

elasticsearch-shard remove-corrupted-data doesn't work on missing metadata #47435

Closed

codebrain mentioned this pull request Oct 25, 2019

7.4.1 meta ticket elastic/elasticsearch-net#4174

Closed

39 tasks

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Omit writing index metadata for non-replicated closed indices on data-only node #47285

Omit writing index metadata for non-replicated closed indices on data-only node #47285

Uh oh!

ywelsch commented Sep 30, 2019

Uh oh!

elasticmachine commented Sep 30, 2019

Uh oh!

DaveCTurner left a comment

Uh oh!

DaveCTurner Sep 30, 2019

Uh oh!

ywelsch Sep 30, 2019

Uh oh!

DaveCTurner Sep 30, 2019

Uh oh!

tlrx left a comment

Uh oh!

Uh oh!

tlrx Sep 30, 2019

Uh oh!

ywelsch Sep 30, 2019

Uh oh!

tlrx Sep 30, 2019

Uh oh!

ywelsch Sep 30, 2019

Uh oh!

DaveCTurner commented Sep 30, 2019

Uh oh!

Uh oh!

Omit writing index metadata for non-replicated closed indices on data-only node #47285

Omit writing index metadata for non-replicated closed indices on data-only node #47285

Uh oh!

Conversation

ywelsch commented Sep 30, 2019

Uh oh!

elasticmachine commented Sep 30, 2019

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Sep 30, 2019

Choose a reason for hiding this comment

Uh oh!

ywelsch Sep 30, 2019

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Sep 30, 2019

Choose a reason for hiding this comment

Uh oh!

tlrx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tlrx Sep 30, 2019

Choose a reason for hiding this comment

Uh oh!

ywelsch Sep 30, 2019

Choose a reason for hiding this comment

Uh oh!

tlrx Sep 30, 2019

Choose a reason for hiding this comment

Uh oh!

ywelsch Sep 30, 2019

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Sep 30, 2019

Uh oh!

Uh oh!