Skip to content

[BUG] org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock flaky #10006

Closed
@sohami

Description

Describe the bug
Test org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock is flaky

To Reproduce

سبت 11, 2023 1:44:23 م com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
WARNING: Uncaught exception in thread: Thread[#339,opensearch[node_t2][clusterApplierService#updateTask][T#1],5,TGRP-MinimumClusterManagerNodesIT]
java.lang.AssertionError: a started primary with non-pending operation term must be in primary mode [test][2], node[IADuWGkCTpuWEnWUFcbkSQ], [P], s[STARTED], a[id=oar4Dv6STMWSzO-FDH4bMA]
	at __randomizedtesting.SeedInfo.seed([7E7C985F304948B0]:0)
	at org.opensearch.index.shard.IndexShard.updateShardState(IndexShard.java:752)
	at org.opensearch.indices.cluster.IndicesClusterStateService.updateShard(IndicesClusterStateService.java:710)
	at org.opensearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:650)
	at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:293)
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:606)
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:593)
	at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:561)
	at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:484)
	at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:186)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849)
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282)
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1623)

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock" -Dtests.seed=7E7C985F304948B0 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ar-SD -Dtests.timezone=Europe/Lisbon -Druntime.java=20
REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock" -Dtests.seed=7E7C985F304948B0 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ar-SD -Dtests.timezone=Europe/Lisbon -Druntime.java=20
REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock" -Dtests.seed=7E7C985F304948B0 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ar-SD -Dtests.timezone=Europe/Lisbon -Druntime.java=20
NOTE: leaving temporary files on disk at: /var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.cluster.MinimumClusterManagerNodesIT_7E7C985F304948B0-001
NOTE: test params are: codec=Asserting(Lucene95), sim=Asserting(RandomSimilarity(queryNorm=false): {}), locale=ar-SD, timezone=Europe/Lisbon
NOTE: Linux 5.15.0-1039-aws amd64/Eclipse Adoptium 20.0.2 (64-bit)/cpus=32,threads=1,free=204825744,total=536870912
NOTE: All tests run in this JVM: [PendingTasksBlocksIT, GetIndexIT, ActiveShardsObserverIT, MinimumClusterManagerNodesIT]

Expected behavior
Test should always pass

Plugins
Standard

Screenshots

Host/Environment (please complete the following information):
https://build.ci.opensearch.org/job/gradle-check/25287/testReport/junit/org.opensearch.cluster/MinimumClusterManagerNodesIT/testThreeNodesNoClusterManagerBlock/

Additional context
https://build.ci.opensearch.org/job/gradle-check/25287/


I (@andrross) am adding the content from this comment to the description here because it has now been buried in the comment stream:

I believe I have traced this back to the commit that introduced the flakiness: 9119b6d (#9105)

The following command will reliably reproduce the failure for me:

./gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock" -Dtests.iters=100

If I select the commit immediately preceding 9119b6d then it does not reproduce.

This is a bit concerning because the commit in question is related to the remote store feature but MinimumClusterManagerNodesIT does not do anything related to remote store, so it is possible there is a significant regression here.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Cluster ManagerbugSomething isn't workingflaky-testRandom test failure that succeeds on second run

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions