Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multinode tests by simulating multiple nodes using Docker. #378

Merged
merged 14 commits into from
Mar 19, 2017

Conversation

jssmith
Copy link
Contributor

@jssmith jssmith commented Mar 17, 2017

For testing we want to be able to simulate a Ray cluster using a number of Docker instances running on a single host. This change adds scripts to boot a cluster matching a specific configuration and to run a test script on that cluster.

Other requirements:

  • Test scripts should run from the command line as well as from within the cluster environment
  • One continuous integration host should simultaneously run tests from different Ray revisions

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/320/
Test FAILed.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/333/
Test FAILed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/334/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/335/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/336/
Test FAILed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/337/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/338/
Test FAILed.

@robertnishihara
Copy link
Collaborator

robertnishihara commented Mar 19, 2017

After doing build-docker.sh, the various tests can be run from the command line with something like

python test/jenkins_tests/multi_node_docker_test.py \
    --num-nodes=5 \
    --test-script=/ray/test/jenkins_tests/multi_node_tests/test_0.py \
    --development-mode

The --development-mode flag copies the local test files into the head node's docker container so that you can edit the tests and run them without rebuilding the docker image.

@robertnishihara
Copy link
Collaborator

After this PR is merged, all PR's should be required to pass the Travis tests as well as the Jenkins tests (the multi-node docker stuff in this PR is only being run on Jenkins).

@robertnishihara
Copy link
Collaborator

The script ./test/jenkins_tests/run_multi_node_tests.sh is what is running on jenkins.

@robertnishihara robertnishihara changed the title Docker cluster Add multinode tests by simulating multiple nodes using Docker. Mar 19, 2017
@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/339/
Test PASSed.

d.start_ray(mem_size=args.mem_size, shm_size=args.shm_size,
num_nodes=args.num_nodes, docker_image=args.docker_image,
development_mode=args.development_mode)
time.sleep(2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas on a better way to know that Ray has started?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, now that I think about it, it should be possible to just go ahead and run the test script even if Ray hasn't started (if Ray hasn't started yet, then the call to ray.init should just retry as needed).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried it out, it works like I said with the caveat that ray.init only retries for a few seconds and then raises an exception.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/340/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/342/
Test PASSed.

@robertnishihara robertnishihara merged commit 29c8471 into ray-project:master Mar 19, 2017
@robertnishihara robertnishihara deleted the dockercluster branch March 19, 2017 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants