Skip to content

Several tests fail with "failed to process cluster event" due to timeout #62853

Closed
@danielmitterdorfer

Description

@danielmitterdorfer

I noticed several tests failing with related error messages:

  • failed to process cluster event (delete_repository [*]) within 30s
  • failed to process cluster event (put-mapping [idx/iv7FVCFrT7K3iGZ0c-6X2w]) within 13s
  • failed to process cluster event (delete-index [[test-idx/TdYM8OZkRkOHagTQzNiCFQ]]) within 30s

From initial analysis it's hard to tell whether they have different root causes or whether there is one underlying cause. However, it appears to me that the issues is not due the individual test case hence I have raised only one issue but please feel free to split this into individual issues if further analysis uncovers that this makes sense.

Build scan:

Repro line:

# taken from the most recent failure in https://gradle-enterprise.elastic.co/s/hig7ipu7fpa4m
./gradlew ':x-pack:plugin:security:internalClusterTest' --tests "org.elasticsearch.integration.FieldLevelSecurityTests.testParentChild" -Dtests.seed=3E1FD9F52E1FDC19 -Dtests.security.manager=true -Dtests.locale=en-GB -Dtests.timezone=Pacific/Gambier -Druntime.java=11

Reproduces locally?:

No

Applicable branches:

  • master
  • 7.9

Failure history:

According to build stats, this failure has occured four times within the last 30 days; 12 times in the last 6 months (excluding pull request builds).

Failure excerpt:

05:03:42   2> org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (delete_repository [*]) within 30s
05:03:42         at __randomizedtesting.SeedInfo.seed([3E1FD9F52E1FDC19:27DEB90468F17E06]:0)
05:03:42         at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:143)
05:03:42         at java.util.ArrayList.forEach(ArrayList.java:1540)
05:03:42         at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:142)
05:03:42         at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:674)
05:03:42         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
05:03:42         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
05:03:42         at java.lang.Thread.run(Thread.java:834)

Metadata

Metadata

Labels

:Distributed Coordination/Cluster CoordinationCluster formation and cluster state publication, including cluster membership and fault detection.>test-failureTriaged test failures from CITeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions