Skip to content

Better handling of ancient indices #44230

Closed
@DaveCTurner

Description

@DaveCTurner

If you create an index in a one-node 5.6 cluster, close it, then upgrade this node to 6.8 and then again to 7.2 without removing the 5.6 index then the node fails to properly fail. Instead, it goes into a loop of repeatedly winning the election and then failing the first publication and then trying again:

[2019-07-11T15:47:19,716][INFO ][o.e.c.s.MasterService    ] [node-0] elected-as-master ([1] nodes joined)[{node-0}{5B4rSbAnRTG5lhS9xWl8pw}{rBQ-DzKKRfCO77PmO_v2TQ}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=17179869184, xpack.installed=true, ml.max_open_jobs=20} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 1, version: 1, reason: maste
r node changed {previous [], current [{node-0}{5B4rSbAnRTG5lhS9xWl8pw}{rBQ-DzKKRfCO77PmO_v2TQ}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=17179869184, xpack.installed=true, ml.max_open_jobs=20}]}
[2019-07-11T15:47:19,728][WARN ][o.e.c.s.MasterService    ] [node-0] failing [elected-as-master ([1] nodes joined)[{node-0}{5B4rSbAnRTG5lhS9xWl8pw}{rBQ-DzKKRfCO77PmO_v2TQ}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=17179869184, xpack.installed=true, ml.max_open_jobs=20} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_]]: failed to commit cluster
 state version [1]
org.elasticsearch.cluster.coordination.FailedToCommitClusterStateException: publication failed
        at org.elasticsearch.cluster.coordination.Coordinator$CoordinatorPublication$3.onFailure(Coordinator.java:1353) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.common.util.concurrent.ListenableFuture$1.run(ListenableFuture.java:101) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:193) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:92) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.common.util.concurrent.ListenableFuture.addListener(ListenableFuture.java:54) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.cluster.coordination.Coordinator$CoordinatorPublication.onCompletion(Coordinator.java:1293) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.cluster.coordination.Publication.onPossibleCompletion(Publication.java:124) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.cluster.coordination.Publication.onPossibleCommitFailure(Publication.java:172) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.cluster.coordination.Publication.access$600(Publication.java:41) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.cluster.coordination.Publication$PublicationTarget$PublishResponseHandler.onFailure(Publication.java:348) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.cluster.coordination.Coordinator$6.onFailure(Coordinator.java:1080) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.cluster.coordination.PublicationTransportHandler$2$1.onFailure(PublicationTransportHandler.java:194) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onFailure(ThreadContext.java:743) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:39) ~[elasticsearch-7.2.0.jar:7.2.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: java.lang.IllegalStateException: index [i/EW17gwGGT5KefG_28xQrcQ] version not supported: 5.6.16 minimum compatible index version is: 6.0.0-beta1
        at org.elasticsearch.cluster.coordination.JoinTaskExecutor.ensureIndexCompatibility(JoinTaskExecutor.java:238) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.cluster.coordination.JoinTaskExecutor.lambda$addBuiltInJoinValidators$0(JoinTaskExecutor.java:281) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.cluster.coordination.Coordinator.lambda$handlePublishRequest$2(Coordinator.java:313) ~[elasticsearch-7.2.0.jar:7.2.0]
        at java.util.ArrayList.forEach(ArrayList.java:1540) ~[?:?]
        at java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1083) ~[?:?]
        at org.elasticsearch.cluster.coordination.Coordinator.handlePublishRequest(Coordinator.java:313) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.cluster.coordination.PublicationTransportHandler$2$1.doRun(PublicationTransportHandler.java:199) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.2.0.jar:7.2.0]
        ... 3 more

The node should fail earlier, and harder, than it does today. But it's a bit trickier than that: what does the user do once their upgraded node is refusing to start? By the time we can have noticed we've got a bad index version we will already have constructed the NodeEnvironment, and therefore written stuff to disk, and that means a downgrade is now unsafe. (#41731 would positively block a subsequent downgrade in a similar situation with a 6.x -> 7.x -> 8.x double-upgrade).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions