Description
If you create an index in a one-node 5.6 cluster, close it, then upgrade this node to 6.8 and then again to 7.2 without removing the 5.6 index then the node fails to properly fail. Instead, it goes into a loop of repeatedly winning the election and then failing the first publication and then trying again:
[2019-07-11T15:47:19,716][INFO ][o.e.c.s.MasterService ] [node-0] elected-as-master ([1] nodes joined)[{node-0}{5B4rSbAnRTG5lhS9xWl8pw}{rBQ-DzKKRfCO77PmO_v2TQ}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=17179869184, xpack.installed=true, ml.max_open_jobs=20} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 1, version: 1, reason: maste
r node changed {previous [], current [{node-0}{5B4rSbAnRTG5lhS9xWl8pw}{rBQ-DzKKRfCO77PmO_v2TQ}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=17179869184, xpack.installed=true, ml.max_open_jobs=20}]}
[2019-07-11T15:47:19,728][WARN ][o.e.c.s.MasterService ] [node-0] failing [elected-as-master ([1] nodes joined)[{node-0}{5B4rSbAnRTG5lhS9xWl8pw}{rBQ-DzKKRfCO77PmO_v2TQ}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=17179869184, xpack.installed=true, ml.max_open_jobs=20} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_]]: failed to commit cluster
state version [1]
org.elasticsearch.cluster.coordination.FailedToCommitClusterStateException: publication failed
at org.elasticsearch.cluster.coordination.Coordinator$CoordinatorPublication$3.onFailure(Coordinator.java:1353) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.common.util.concurrent.ListenableFuture$1.run(ListenableFuture.java:101) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:193) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:92) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.common.util.concurrent.ListenableFuture.addListener(ListenableFuture.java:54) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.coordination.Coordinator$CoordinatorPublication.onCompletion(Coordinator.java:1293) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.coordination.Publication.onPossibleCompletion(Publication.java:124) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.coordination.Publication.onPossibleCommitFailure(Publication.java:172) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.coordination.Publication.access$600(Publication.java:41) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.coordination.Publication$PublicationTarget$PublishResponseHandler.onFailure(Publication.java:348) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.coordination.Coordinator$6.onFailure(Coordinator.java:1080) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.coordination.PublicationTransportHandler$2$1.onFailure(PublicationTransportHandler.java:194) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onFailure(ThreadContext.java:743) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:39) ~[elasticsearch-7.2.0.jar:7.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: java.lang.IllegalStateException: index [i/EW17gwGGT5KefG_28xQrcQ] version not supported: 5.6.16 minimum compatible index version is: 6.0.0-beta1
at org.elasticsearch.cluster.coordination.JoinTaskExecutor.ensureIndexCompatibility(JoinTaskExecutor.java:238) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.coordination.JoinTaskExecutor.lambda$addBuiltInJoinValidators$0(JoinTaskExecutor.java:281) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.coordination.Coordinator.lambda$handlePublishRequest$2(Coordinator.java:313) ~[elasticsearch-7.2.0.jar:7.2.0]
at java.util.ArrayList.forEach(ArrayList.java:1540) ~[?:?]
at java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1083) ~[?:?]
at org.elasticsearch.cluster.coordination.Coordinator.handlePublishRequest(Coordinator.java:313) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.coordination.PublicationTransportHandler$2$1.doRun(PublicationTransportHandler.java:199) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.2.0.jar:7.2.0]
... 3 more
The node should fail earlier, and harder, than it does today. But it's a bit trickier than that: what does the user do once their upgraded node is refusing to start? By the time we can have noticed we've got a bad index version we will already have constructed the NodeEnvironment
, and therefore written stuff to disk, and that means a downgrade is now unsafe. (#41731 would positively block a subsequent downgrade in a similar situation with a 6.x -> 7.x -> 8.x double-upgrade).