Skip to content

FollowerFailOverIT testFailOverOnFollower & testAddNewReplicasOnFollower failure on CI  #47137

Closed
@gwbrown

Description

@gwbrown

Build scan

This has only happened once as far as I can tell from build-stats.

There are two test failures that might be related. Looks like something in testAddNewReplicasOnFollower tripped an AssertionError in production code:

[2019-09-25T11:05:06,464][INFO ][o.e.x.c.FollowerFailOverIT] [testAddNewReplicasOnFollower] waiting for the global checkpoint on [[follower-index][0]] at least [50]
Sep 25, 2019 11:05:07 HENGIHENGI com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
WARNING: Uncaught exception in thread: Thread[elasticsearch[followerd3][write][T#3],5,TGRP-FollowerFailOverIT]
java.lang.AssertionError: 68 < 54
	at __randomizedtesting.SeedInfo.seed([44251EB6B9FF91BB]:0)
	at org.elasticsearch.xpack.ccr.index.engine.FollowingEngine.advanceMaxSeqNoOfUpdatesOrDeletesOnPrimary(FollowingEngine.java:120)
	at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:894)
	at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:796)
	at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:768)
	at org.elasticsearch.index.shard.IndexShard.applyTranslogOperation(IndexShard.java:1481)
	at org.elasticsearch.index.shard.IndexShard.applyTranslogOperation(IndexShard.java:1468)
	at org.elasticsearch.xpack.ccr.action.bulk.TransportBulkShardOperationsAction.shardOperationOnPrimary(TransportBulkShardOperationsAction.java:129)
	at org.elasticsearch.xpack.ccr.action.bulk.TransportBulkShardOperationsAction.lambda$shardOperationOnPrimary$0(TransportBulkShardOperationsAction.java:70)
	at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:285)
	at org.elasticsearch.xpack.ccr.action.bulk.TransportBulkShardOperationsAction.shardOperationOnPrimary(TransportBulkShardOperationsAction.java:70)
	at org.elasticsearch.xpack.ccr.action.bulk.TransportBulkShardOperationsAction.shardOperationOnPrimary(TransportBulkShardOperationsAction.java:36)
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:916)
	at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:108)
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.runWithPrimaryShardReference(TransportReplicationAction.java:393)
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$doRun$0(TransportReplicationAction.java:315)
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63)
	at org.elasticsearch.index.shard.IndexShard.lambda$wrapPrimaryOperationPermitListener$21(IndexShard.java:2741)
	at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:113)
	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:285)
	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:237)
	at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2715)
	at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryOperationPermit(TransportReplicationAction.java:857)
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:311)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at org.elasticsearch.action.support.replication.TransportReplicationAction.handlePrimaryRequest(TransportReplicationAction.java:274)
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63)
	at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:724)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:769)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

The other test that failed, testFailOverOnFollower, looks like it failed due to a node not existing that was expected to exist (as could happen if, say, that node had thrown an AssertionError):

Caused by: 
java.lang.AssertionError
: 
org.elasticsearch.node.NodeClosedException: node closed {leader1}{vymgvoIzSZa-u8V0f0y_mw}{PwWb0SZ9Quen-yj_D24GGQ}{127.0.0.1}{127.0.0.1:35421}{dim}{xpack.installed=true}
Close stacktrace
at __randomizedtesting.SeedInfo.seed([44251EB6B9FF91BB]:0)
at org.elasticsearch.xpack.ccr.FollowerFailOverIT.lambda$testAddNewReplicasOnFollower$2(FollowerFailOverIT.java:203)
at java.lang.Thread.run(Thread.java:834)
Caused by: 
org.elasticsearch.node.NodeClosedException
: 
node closed {leader1}{vymgvoIzSZa-u8V0f0y_mw}{PwWb0SZ9Quen-yj_D24GGQ}{127.0.0.1}{127.0.0.1:35421}{dim}{xpack.installed=true}
Close stacktrace
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onClusterServiceClose(TransportReplicationAction.java:800)
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onClusterServiceClose(ClusterStateObserver.java:318)
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onClose(ClusterStateObserver.java:237)
at org.elasticsearch.cluster.service.ClusterApplierService.doStop(ClusterApplierService.java:186)
at org.elasticsearch.common.component.AbstractLifecycleComponent.stop(AbstractLifecycleComponent.java:79)
at org.elasticsearch.cluster.service.ClusterService.doStop(ClusterService.java:96)
at org.elasticsearch.common.component.AbstractLifecycleComponent.stop(AbstractLifecycleComponent.java:79)
at org.elasticsearch.node.Node.stop(Node.java:789)
at org.elasticsearch.node.Node.close(Node.java:813)
at org.elasticsearch.test.InternalTestCluster$NodeAndClient.close(InternalTestCluster.java:952)
at org.elasticsearch.core.internal.io.IOUtils.close(IOUtils.java:104)
at org.elasticsearch.core.internal.io.IOUtils.close(IOUtils.java:86)
at org.elasticsearch.test.InternalTestCluster.close(InternalTestCluster.java:803)
at org.elasticsearch.core.internal.io.IOUtils.close(IOUtils.java:104)
at org.elasticsearch.core.internal.io.IOUtils.close(IOUtils.java:62)
at org.elasticsearch.xpack.CcrIntegTestCase$ClusterGroup.close(CcrIntegTestCase.java:748)
at org.elasticsearch.core.internal.io.IOUtils.close(IOUtils.java:104)
at org.elasticsearch.core.internal.io.IOUtils.close(IOUtils.java:62)
at org.elasticsearch.xpack.CcrIntegTestCase.stopClusters(CcrIntegTestCase.java:263)
at org.elasticsearch.xpack.CcrIntegTestCase.startClusters(CcrIntegTestCase.java:147)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:566)
at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:972)
at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at java.lang.Thread.run(Thread.java:834)

Reproduce lines (neither reproduces locally for me on latest master):

./gradlew ':x-pack:plugin:ccr:internalClusterTest' --tests "org.elasticsearch.xpack.ccr.FollowerFailOverIT.testAddNewReplicasOnFollower" -Dtests.seed=44251EB6B9FF91BB -Dtests.security.manager=true -Dtests.locale=to -Dtests.timezone=America/Dawson_Creek -Dcompiler.java=12 -Druntime.java=11
./gradlew ':x-pack:plugin:ccr:internalClusterTest' --tests "org.elasticsearch.xpack.ccr.FollowerFailOverIT.testFailOverOnFollower" -Dtests.seed=44251EB6B9FF91BB -Dtests.security.manager=true -Dtests.locale=to -Dtests.timezone=America/Dawson_Creek -Dcompiler.java=12 -Druntime.java=11

Metadata

Metadata

Assignees

Labels

:Distributed Indexing/CCRIssues around the Cross Cluster State Replication features>test-failureTriaged test failures from CI

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions