Skip to content

BWC Rolling Upgrade tests failing #1691

Closed
@ryanbogan

Description

Currently, BWC rolling upgrade tests are failing with the following error:

↓ errors and warnings from /home/runner/work/k-NN/k-NN/qa/rolling-upgrade/build/testclusters/knnBwcCluster-rolling-2/logs/opensearch.stdout.log ↓
» WARN ][o.o.g.DanglingIndicesState] [knnBwcCluster-rolling-2] gateway.auto_import_dangling_indices is disabled, dangling indices will not be automatically detected or imported and must be managed manually
» WARN ][o.o.d.FileBasedSeedHostsProvider] [knnBwcCluster-rolling-2] expected, but did not find, a dynamic hosts list at [/home/runner/work/k-NN/k-NN/qa/rolling-upgrade/build/testclusters/knnBwcCluster-rolling-2/config/unicast_hosts.txt]
» WARN ][o.o.c.s.MasterService    ] [knnBwcCluster-rolling-2] failing [elected-as-cluster-manager ([3] nodes joined)[{knnBwcCluster-rolling-2}{o4vnPogpQqO6HsmBUtfNGw}{V_42OJ-oT8WWfb2Np4vUGA}{127.0.0.1}{127.0.0.1:45009}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} elect leader, {knnBwcCluster-rolling-0}{Md9MTY4rTZiuFe6_CfeW1w}{9L3fYVNKQvaFHBtpd9LZpQ}{127.0.0.1}{127.0.0.1:42881}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} elect leader, {knnBwcCluster-rolling-1}{fWwXwx_dSdqpKeu1eiCkZg}{cBQqyuvkQTyKg6ugNlabuw}{127.0.0.1}{127.0.0.1:32997}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} elect leader, _BECOME_CLUSTER_MANAGER_TASK_, _FINISH_ELECTION_]]: failed to commit cluster state version [1]
»  org.opensearch.cluster.coordination.FailedToCommitClusterStateException: node is no longer cluster-manager for term 1 while handling publication
»  	at org.opensearch.cluster.coordination.Coordinator.publish(Coordinator.java:1294) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.cluster.service.MasterService.publish(MasterService.java:355) [opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:337) [opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:212) [opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:204) [opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:242) [opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:854) [opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283) [opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246) [opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
»  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
»  	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
» WARN ][o.o.c.s.MasterService    ] [knnBwcCluster-rolling-2] failing [elected-as-cluster-manager ([2] nodes joined)[{knnBwcCluster-rolling-2}{o4vnPogpQqO6HsmBUtfNGw}{V_42OJ-oT8WWfb2Np4vUGA}{127.0.0.1}{127.0.0.1:45009}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} elect leader, {knnBwcCluster-rolling-0}{Md9MTY4rTZiuFe6_CfeW1w}{9L3fYVNKQvaFHBtpd9LZpQ}{127.0.0.1}{127.0.0.1:42881}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} elect leader, _BECOME_CLUSTER_MANAGER_TASK_, _FINISH_ELECTION_], node-join[{knnBwcCluster-rolling-1}{fWwXwx_dSdqpKeu1eiCkZg}{cBQqyuvkQTyKg6ugNlabuw}{127.0.0.1}{127.0.0.1:32997}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} join existing leader]]: failed to commit cluster state version [1]
»  org.opensearch.cluster.coordination.FailedToCommitClusterStateException: node is no longer cluster-manager for term 4 while handling publication
»  	at org.opensearch.cluster.coordination.Coordinator.publish(Coordinator.java:1294) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.cluster.service.MasterService.publish(MasterService.java:355) [opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:337) [opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:212) [opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:204) [opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:242) [opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:854) [opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283) [opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246) [opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
»  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
»  	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
» WARN ][o.o.k.i.c.K.KNN80DocValuesConsumer] [knnBwcCluster-rolling-2] Refresh operation complete in 7 ms
» WARN ][o.o.k.i.c.K.KNN80DocValuesConsumer] [knnBwcCluster-rolling-2] Refresh operation complete in 0 ms
»   ↑ repeated 9 times ↑
» WARN ][o.o.c.NodeConnectionsService] [knnBwcCluster-rolling-2] failed to connect to {knnBwcCluster-rolling-0}{Md9MTY4rTZiuFe6_CfeW1w}{9L3fYVNKQvaFHBtpd9LZpQ}{127.0.0.1}{127.0.0.1:42881}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} (tried [1] times)
»  org.opensearch.transport.ConnectTransportException: [knnBwcCluster-rolling-0][127.0.0.1:42881] connect_exception
»  	at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1094) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.core.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:217) ~[opensearch-core-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:57) ~[opensearch-common-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
»  	at java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:887) ~[?:?]
»  	at java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2357) ~[?:?]
»  	at org.opensearch.common.concurrent.CompletableContext.addListener(CompletableContext.java:60) ~[opensearch-common-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.transport.netty4.Netty4TcpChannel.addConnectListener(Netty4TcpChannel.java:136) ~[?:?]
»  	at org.opensearch.transport.TcpTransport.initiateConnection(TcpTransport.java:383) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.transport.TcpTransport.openConnection(TcpTransport.java:343) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.transport.ClusterConnectionManager.internalOpenConnection(ClusterConnectionManager.java:274) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.transport.ClusterConnectionManager.connectToNode(ClusterConnectionManager.java:157) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.transport.TransportService.connectToNode(TransportService.java:505) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.transport.TransportService.connectToNode(TransportService.java:485) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.cluster.NodeConnectionsService$ConnectionTarget$1.doRun(NodeConnectionsService.java:346) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) ~[?:?]
»  	at org.opensearch.cluster.NodeConnectionsService.connectToNodes(NodeConnectionsService.java:159) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.cluster.service.ClusterApplierService.connectToNodesAndWait(ClusterApplierService.java:585) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:550) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:486) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:188) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:854) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
»  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
»  	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
»  Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 127.0.0.1/127.0.0.1:42881
»  Caused by: java.net.ConnectException: Connection refused
»  	at java.base/sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
»  	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:682) ~[?:?]
»  	at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:973) ~[?:?]
»  	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337) ~[?:?]
»  	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:339) ~[?:?]
»  	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776) ~[?:?]
»  	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) ~[?:?]
»  	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) ~[?:?]
»  	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[?:?]
»  	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[?:?]
»  	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
»  	... 1 more

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions