Skip to content

increase watcher stop test max to 60s #45679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 19, 2019

Conversation

jakelandis
Copy link
Contributor

@jakelandis jakelandis commented Aug 16, 2019

As of #43939 Watcher tests now correctly block until all Watch executions
kicked off by that test are finished. Prior we allowed tests to finish with
outstanding watch executions. It was known that this would increase the
time needed to finish a test. However, running the tests on CI can be slow
and on at least 1 occasion it took 60s to actually finish.

This PR simply increases the max allowable timeout for Watcher tests
to clean up after themselves.


Example of a log that illustrates that the time out needs to be increase:
Tests (with 30s max):

13:25:12   1> [2019-08-17T03:23:53,392][INFO ][o.e.s.WatcherRestIT      ] [test] [yaml=painless/30_inline_watch/Test execute watch api with an inline watch] before test
13:25:12   1> [2019-08-17T03:24:26,457][INFO ][o.e.s.WatcherRestIT      ] [test] There are still tasks running after this test that might break subsequent tests [indices:data/write/bulk, indices:data/write/bulk[s]].
13:25:12   1> [2019-08-17T03:24:26,457][INFO ][o.e.s.WatcherRestIT      ] [test] [yaml=painless/30_inline_watch/Test execute watch api with an inline watch] after test
13:25:12   2> REPRODUCE WITH: ./gradlew :x-pack:qa:smoke-test-watcher:integTestRunner --tests "org.elasticsearch.smoketest.WatcherRestIT.test {yaml=painless/30_inline_watch/Test execute watch api with an inline watch}" -Dtests.seed=CDFB722B1E040D96 -Dtests.security.manager=true -Dtests.locale=so-DJ -Dtests.timezone=Japan -Dcompiler.java=12 -Druntime.java=11
13:25:12   2> java.lang.AssertionError: waiting until stopping state reached stopped state
13:25:12         at __randomizedtesting.SeedInfo.seed([CDFB722B1E040D96:45AF4DF1B0F8606E]:0)

Server

[2019-08-16T18:23:53,489][DEBUG][o.e.x.w.e.ExecutionService] [integTest-0] executing watch [_inlined_] 
[2019-08-16T18:23:53,493][DEBUG][o.e.x.w.e.ExecutionService] [integTest-0] finished [_inlined_]/[_inlined__a3a9c868-cac7-45e1-8916-f212e792c754-2019-08-16T18:23:53.488691Z]
[2019-08-16T18:23:53,524][INFO ][o.e.x.w.WatcherService   ] [integTest-0] stopping watch service, reason [watcher manually marked to shutdown by cluster state update].
[2019-08-16T18:24:26,395][INFO ][o.e.c.m.MetaDataIndexTemplateService] [integTest-0] removing template [.logstash-management]
[2019-08-16T18:24:26,414][INFO ][o.e.c.m.TemplateUpgradeService] [integTest-0] Starting template upgrade to version 8.0.0, 1 templates will be updated and 0 will be removed
[2019-08-16T18:24:26,424][INFO ][o.e.c.m.MetaDataIndexTemplateService] [integTest-0] adding template [.logstash-management] for index patterns [.logstash]
[2019-08-16T18:24:26,447][INFO ][o.e.c.m.TemplateUpgradeService] [integTest-0] Templates were upgraded successfully to version 8.0.0
[2019-08-16T18:24:53,016][ERROR][o.e.x.w.Watcher          ] [integTest-0] watch history could not be written [test_watch_eeb836eb-0cd1-427c-9c98-0d7398cfae08-2019-08-16T18:23:52.926447Z], failure [[.watcher-history-10-2019.08.16] IndexNotFoundException[no such index [.watcher-history-10-2019.08.16]]]   
[2019-08-16T18:24:53,017][ERROR][o.e.x.w.Watcher          ] [integTest-0] triggered watches could not be deleted [test_watch_eeb836eb-0cd1-427c-9c98-0d7398cfae08-2019-08-16T18:23:52.926447Z], failure [[.triggered_watches] IndexNotFoundException[no such index [.triggered_watches]]]
[2019-08-16T18:24:53,018][DEBUG][o.e.x.w.e.ExecutionService] [integTest-0] finished [test_watch]/[test_watch_eeb836eb-0cd1-427c-9c98-0d7398cfae08-2019-08-16T18:23:52.926447Z] 
[2019-08-16T18:24:53,018][INFO ][o.e.x.w.WatcherLifeCycleService] [integTest-0] watcher has stopped  

Since these are run on the same machine, it is OK to compare timestamps with precision.

The test logs above says:
The test started at 23:53,392 and finished at 24:26,457.
Since the timeout in this example is 30s, and the time the test took is 30+s we can start to assume a timeout.
"waiting until stopping state reached stopped state" -> this says that we told Watcher to shut down, but it never fully reached a shutdown state in the allowed time (30s in this case)
"There are still tasks running after this test" -> is an another indication that test timed out with outstanding work to be done.

Looking at the server logs:
We start to look just after 23:53,392, and see that the Watch itself executed VERY fast 23:53,489 -> 23:53,493
So the test framework executed the Watch and started the shutdown process at 23:53,524 (which now ensure all outstanding work is done before finishing the shutdown)
We can also see that the Watcher did not fully stop until 24:53,018, so this test on this machine at this moment in time, took just shy of 60s from starting to shut down till it actually shuts down (on my local machine it takes milliseconds)
"watch history could not be written" and "triggered watches could not be deleted" are artifacts of the test framework cleaning up, but the lingering task continues to live in the back ground. With #43939 this should only happen if the stop timeout is exceeded.
"finished [test_watch]" -> is an indication the next test running before the prior test has actually finished. Again with #43939 this should only happen if the stop timeout is exceeded.

@jakelandis jakelandis added >test Issues or PRs that are addressing/adding tests :Data Management/Watcher labels Aug 16, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jakelandis jakelandis merged commit 0e3e8c6 into elastic:master Aug 19, 2019
@jakelandis jakelandis deleted the watcher_test_timeout_increase branch August 19, 2019 19:24
jakelandis added a commit to jakelandis/elasticsearch that referenced this pull request Aug 23, 2019
As of elastic#43939 Watcher tests now correctly block until all Watch executions
kicked off by that test are finished. Prior we allowed tests to finish with
outstanding watch executions. It was known that this would increase the
time needed to finish a test. However, running the tests on CI can be slow
and on at least 1 occasion it took 60s to actually finish.

This PR simply increases the max allowable timeout for Watcher tests
to clean up after themselves.
jakelandis added a commit that referenced this pull request Aug 26, 2019
As of #43939 Watcher tests now correctly block until all Watch executions
kicked off by that test are finished. Prior we allowed tests to finish with
outstanding watch executions. It was known that this would increase the
time needed to finish a test. However, running the tests on CI can be slow
and on at least 1 occasion it took 60s to actually finish.

This PR simply increases the max allowable timeout for Watcher tests
to clean up after themselves.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Watcher >test Issues or PRs that are addressing/adding tests v7.4.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants