Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dashboard] Add dashboard multi-node churn test #11768

Merged
merged 10 commits into from
Dec 14, 2020

Conversation

mfitton
Copy link
Contributor

@mfitton mfitton commented Nov 2, 2020

Why are these changes needed?

Adds a test to the dashboard that makes sure it continues to correctly function while a thread adds and removes nodes from the cluster at random.
This test had another line to test the node summary endpoint, but that currently causes a failure. Actually fixing this issue will be the subject of another PR. For the time being, this line is omitted.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@mfitton mfitton requested review from rkooo567 and edoakes and removed request for rkooo567 November 2, 2020 22:26
Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +246 to +247
t = threading.Thread(target=cluster_chaos_monkey, daemon=True)
t.start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could consider using a ray actor for this as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edoakes I like that idea. What's the best way to force my actor to live on the head node? Otherwise I'm concerned that it'll get destroyed by a node getting taken down (although I guess it would then be restarted on a node that's still up).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfitton you can create a custom resource on the head node and target that resource with Actor.options(resources=xxx).

Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Comment on lines +246 to +247
t = threading.Thread(target=cluster_chaos_monkey, daemon=True)
t.start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfitton you can create a custom resource on the head node and target that resource with Actor.options(resources=xxx).

@edoakes
Copy link
Contributor

edoakes commented Nov 9, 2020

@edoakes edoakes added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 2, 2020
@mfitton mfitton added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Dec 14, 2020
@edoakes edoakes merged commit d0813c1 into ray-project:master Dec 14, 2020
@mfitton mfitton deleted the new-multi-node-test branch December 14, 2020 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants