Skip to content

Assertion failures cause yaml REST tests to timeout instead of displaying error #31834

Closed
@polyfractal

Description

@polyfractal

It appears that the yaml REST tests run with assertions enabled, but if an assertion is tripped it isn't reported to the user. This causes the REST test to stall for 30s before failing with a generic timeout, and errors about lingering tasks.

I'm afraid I don't know enough about the test infra to know how to fix this, or if it's fixable. But it was very time-consuming to hunt down the cause. Timeouts make you think faulty async logic or something, not an assertion being tripped. If someone points me in the right direction I can take a crack at fixing this, if it is indeed fixable.

Longer version
The timeout error looks like this. You receive a null body back, the test framework complains about the request task still running and the listener times out after 30s:

HEARTBEAT J0 PID(10689@desktop): 2018-07-03T17:44:26, stalled for 31.7s at: XPackRestIT.test {p0=rollup/get_rollup_index_caps/Verify one job caps by rollup index}
  1> [2018-07-04T05:44:27,179][INFO ][o.e.x.t.r.XPackRestIT    ] Stash dump on test failure [{
  1>   "stash" : {
  1>     "body" : null
  1>   }
  1> }]
HEARTBEAT J0 PID(10689@desktop): 2018-07-03T17:44:36, stalled for 41.7s at: XPackRestIT.test {p0=rollup/get_rollup_index_caps/Verify one job caps by rollup index}
  1> [2018-07-04T05:44:37,858][INFO ][o.e.x.t.r.XPackRestIT    ] There are still tasks running after this test that might break subsequent tests [indices:data/read/xpack/rollup/get/index/caps].
  1> [2018-07-04T05:44:37,893][INFO ][o.e.x.t.r.XPackRestIT    ] [test {p0=rollup/get_rollup_index_caps/Verify one job caps by rollup index}]: after test
  2> REPRODUCE WITH: ./gradlew :x-pack:plugin:integTestRunner -Dtests.seed=90C0204471BA5F00 -Dtests.class=org.elasticsearch.xpack.test.rest.XPackRestIT -Dtests.method="test {p0=rollup/get_rollup_index_caps/Verify one job caps by rollup index}" -Dtests.security.manager=true -Dtests.locale=de-CH -Dtests.timezone=Asia/Irkutsk -Dtests.rest.blacklist=getting_started/10_monitor_cluster_health/*
  2> REPRODUCE WITH: ./gradlew :x-pack:plugin:integTestRunner -Dtests.seed=90C0204471BA5F00 -Dtests.class=org.elasticsearch.xpack.test.rest.XPackRestIT -Dtests.method="test {p0=rollup/get_rollup_index_caps/Verify one job caps by rollup index}" -Dtests.security.manager=true -Dtests.locale=de-CH -Dtests.timezone=Asia/Irkutsk -Dtests.rest.blacklist=getting_started/10_monitor_cluster_health/*
ERROR   43.3s | XPackRestIT.test {p0=rollup/get_rollup_index_caps/Verify one job caps by rollup index} <<< FAILURES!
   > Throwable #1: java.lang.RuntimeException: Failure at [rollup/get_rollup_index_caps:65]: listener timeout after waiting for [30000] ms
   >    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:396)
   >    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:371)
   >    at java.lang.Thread.run(Thread.java:748)
   > Caused by: java.io.IOException: listener timeout after waiting for [30000] ms
   >    at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:899)
   >    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:227)
   >    at org.elasticsearch.test.rest.yaml.ClientYamlTestClient.callApi(ClientYamlTestClient.java:192)
   >    at org.elasticsearch.test.rest.yaml.ClientYamlTestExecutionContext.callApiInternal(ClientYamlTestExecutionContext.java:168)
   >    at org.elasticsearch.test.rest.yaml.ClientYamlTestExecutionContext.callApi(ClientYamlTestExecutionContext.java:100)
   >    at org.elasticsearch.test.rest.yaml.section.DoSection.execute(DoSection.java:243)
   >    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:387)
   >    ... 38 moreThrowable #2: java.lang.AssertionError: 1 active tasks found:
   > indices:data/read/xpack/rollup/get/index/caps GvLOpehATvWmUC_M7s1A0A:46 -                         transport 1530654237179 17:43:57 40.2s       127.0.0.1 node-0 
   >  expected:<0> but was:<1>
   >    at org.elasticsearch.xpack.test.rest.XPackRestTestHelper.lambda$waitForPendingTasks$2(XPackRestTestHelper.java:110)
   >    at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:767)
   >    at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:741)
   >    at org.elasticsearch.xpack.test.rest.XPackRestTestHelper.waitForPendingTasks(XPackRestTestHelper.java:90)
   >    at org.elasticsearch.xpack.test.rest.XPackRestIT.cleanup(XPackRestIT.java:246)
   >    at java.lang.Thread.run(Thread.java:748)
   >    Suppressed: java.lang.AssertionError: 1 active tasks found:
   > indices:data/read/xpack/rollup/get/index/caps GvLOpehATvWmUC_M7s1A0A:46 -                         transport 1530654237179 17:43:57 30s         127.0.0.1 node-0 
   >  expected:<0> but was:<1>
   >            at org.elasticsearch.xpack.test.rest.XPackRestTestHelper.lambda$waitForPendingTasks$2(XPackRestTestHelper.java:110)
   >            at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:755)
   >            ... 38 more

After some debugging, I realized I was tripping this assertion. If I comment out that assertion and rerun the test, I get the auth denial exception thrown by Security:

 1> [2018-07-04T06:25:22,987][INFO ][o.e.x.t.r.XPackRestIT    ] Stash dump on test failure [{
  1>   "stash" : {
  1>     "body" : {
  1>       "error" : {
  1>         "root_cause" : [
  1>           {
  1>             "type" : "security_exception",
  1>             "reason" : "action [indices:data/read/xpack/rollup/get/index/caps] is unauthorized for user [x_pack_rest_user]",
  1>             "stack_trace" : "ElasticsearchSecurityException[action [indices:data/read/xpack/rollup/get/index/caps] is unauthorized for user [x_pack_rest_user]]
  1>    at org.elasticsearch.xpack.core.security.support.Exceptions.authorizationError(Exceptions.java:30)
  1>    at org.elasticsearch.xpack.security.authz.AuthorizationService.denialException(AuthorizationService.java:574)
  1>    at org.elasticsearch.xpack.security.authz.AuthorizationService.denial(AuthorizationService.java:552)
  1>    at org.elasticsearch.xpack.security.authz.AuthorizationService.authorize(AuthorizationService.java:260)
  1>    at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$authorizeRequest$4(SecurityActionFilter.java:171)
  1>    at org.elasticsearch.xpack.security.authz.AuthorizationUtils$AsyncAuthorizer.maybeRun(AuthorizationUtils.java:173)
  1>    at org.elasticsearch.xpack.security.authz.AuthorizationUtils$AsyncAuthorizer.setRunAsRoles(AuthorizationUtils.java:167)
  1>    at org.elasticsearch.xpack.security.authz.AuthorizationUtils$AsyncAuthorizer.authorize(AuthorizationUtils.java:155)
  1>    at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.authorizeRequest(SecurityActionFilter.java:183)
  1>    at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$applyInternal$3(SecurityActionFilter.java:161)
  1>    at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:60)
...
...

So it's definitely the assertion that's causing the timeout problem (aside from the actual issue that caused the assertion to be tripped).

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions