[Dashboard] Add dashboard multi-node churn test #11768

mfitton · 2020-11-02T22:25:55Z

Why are these changes needed?

Adds a test to the dashboard that makes sure it continues to correctly function while a thread adds and removes nodes from the cluster at random.
This test had another line to test the node summary endpoint, but that currently causes a failure. Actually fixing this issue will be the subject of another PR. For the time being, this line is omitted.

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

edoakes

LGTM

edoakes · 2020-11-03T17:53:32Z

dashboard/modules/stats_collector/tests/test_stats_collector.py

+    t = threading.Thread(target=cluster_chaos_monkey, daemon=True)
+    t.start()


Could consider using a ray actor for this as well

@edoakes I like that idea. What's the best way to force my actor to live on the head node? Otherwise I'm concerned that it'll get destroyed by a node getting taken down (although I guess it would then be restarted on a node that's still up).

@mfitton you can create a custom resource on the head node and target that resource with Actor.options(resources=xxx).

edoakes

LGTM.

edoakes · 2020-11-06T18:03:05Z

dashboard/modules/stats_collector/tests/test_stats_collector.py

+    t = threading.Thread(target=cluster_chaos_monkey, daemon=True)
+    t.start()


@mfitton you can create a custom resource on the head node and target that resource with Actor.options(resources=xxx).

edoakes · 2020-11-09T20:05:23Z

@mfitton lint is failing:
https://travis-ci.com/github/ray-project/ray/jobs/425047934

…n test longer again now that I realized that was not the issue.

add new test

9b6bbbf

mfitton requested review from rkooo567 and edoakes and removed request for rkooo567 November 2, 2020 22:26

mfitton assigned edoakes Nov 2, 2020

edoakes reviewed Nov 3, 2020

View reviewed changes

edoakes approved these changes Nov 6, 2020

View reviewed changes

mfitton added 5 commits November 9, 2020 14:14

fix lint

9c08438

attempt to stop new test from timing out in ci

a1c1430

Address flaky tests

7390184

Merge branch 'master' into new-multi-node-test

b588a31

increase timeout for test that has been flaky

32c72ea

edoakes added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 2, 2020

mfitton added 4 commits December 2, 2020 16:31

make test even shorter

699c87c

Make dashboard bazel timeout longer and make the stats collector chur…

f64130d

…n test longer again now that I realized that was not the issue.

try to fix BUILD file

5ee1843

Also make test_dashboard a medium-size test

170cfc2

mfitton added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Dec 14, 2020

edoakes merged commit d0813c1 into ray-project:master Dec 14, 2020

mfitton deleted the new-multi-node-test branch December 14, 2020 23:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dashboard] Add dashboard multi-node churn test #11768

[Dashboard] Add dashboard multi-node churn test #11768

mfitton commented Nov 2, 2020

edoakes left a comment

edoakes Nov 3, 2020

mfitton Nov 3, 2020

edoakes Nov 6, 2020

edoakes left a comment

edoakes Nov 6, 2020

edoakes commented Nov 9, 2020

		t = threading.Thread(target=cluster_chaos_monkey, daemon=True)
		t.start()

[Dashboard] Add dashboard multi-node churn test #11768

[Dashboard] Add dashboard multi-node churn test #11768

Conversation

mfitton commented Nov 2, 2020

Why are these changes needed?

Related issue number

Checks

edoakes left a comment

Choose a reason for hiding this comment

edoakes Nov 3, 2020

Choose a reason for hiding this comment

mfitton Nov 3, 2020

Choose a reason for hiding this comment

edoakes Nov 6, 2020

Choose a reason for hiding this comment

edoakes left a comment

Choose a reason for hiding this comment

edoakes Nov 6, 2020

Choose a reason for hiding this comment

edoakes commented Nov 9, 2020