Skip to content

[CI] AssertionError in ShardSearchStats #37185

Closed
@droberts195

Description

@droberts195

A 6.6 ML test timed out in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.6+matrix-java-periodic/ES_BUILD_JAVA=java11,ES_RUNTIME_JAVA=zulu11,nodes=virtual&&linux/38/consoleText because the thread that was initializing a scroll threw an assertion error:

ERROR   0.00s J3 | TooManyJobsIT (suite) <<< FAILURES!
   > Throwable #1: java.lang.Exception: Suite timeout exceeded (>= 1200000 msec).
   >    at __randomizedtesting.SeedInfo.seed([ACEE7CCD596311C5]:0)
   > Throwable #2: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1882, name=elasticsearch[node_t1][search][T#3], state=RUNNABLE, group=TGRP-TooManyJobsIT]
   > Caused by: java.lang.AssertionError
   >    at __randomizedtesting.SeedInfo.seed([ACEE7CCD596311C5]:0)
   >    at org.elasticsearch.index.search.stats.ShardSearchStats.lambda$onQueryPhase$2(ShardSearchStats.java:101)
   >    at org.elasticsearch.index.search.stats.ShardSearchStats.computeStats(ShardSearchStats.java:142)
   >    at org.elasticsearch.index.search.stats.ShardSearchStats.onQueryPhase(ShardSearchStats.java:93)
   >    at org.elasticsearch.index.shard.SearchOperationListener$CompositeListener.onQueryPhase(SearchOperationListener.java:155)
   >    at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:407)
   >    at org.elasticsearch.search.SearchService.access$100(SearchService.java:126)
   >    at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:360)
   >    at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:356)
   >    at org.elasticsearch.search.SearchService$4.doRun(SearchService.java:1117)
   >    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759)
   >    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
   >    at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
   >    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
   >    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   >    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
   >    at java.base/java.lang.Thread.run(Thread.java:834)

The REPRO command is this:

./gradlew :x-pack:plugin:ml:internalClusterTest \
  -Dtests.seed=ACEE7CCD596311C5 \
  -Dtests.class=org.elasticsearch.xpack.ml.integration.TooManyJobsIT \
  -Dtests.method="testSingleNode" \
  -Dtests.security.manager=true \
  -Dtests.locale=yue-Hans-CN \
  -Dtests.timezone=UCT \
  -Dcompiler.java=11 \
  -Druntime.java=11

However, it doesn't reproduce locally for me.

The scroll that was being initialized when this happened is the one that is set up in

SearchRequest searchRequest = new SearchRequest(index);
searchRequest.indicesOptions(MlIndicesUtils.addIgnoreUnavailable(SearchRequest.DEFAULT_INDICES_OPTIONS));
searchRequest.scroll(CONTEXT_ALIVE_DURATION);
searchRequest.source(new SearchSourceBuilder()
.size(BATCH_SIZE)
.query(getQuery())
.fetchSource(shouldFetchSource())
.sort(SortBuilders.fieldSort(ElasticsearchMappings.ES_DOC)));
SearchResponse searchResponse = client.search(searchRequest).actionGet();

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions