Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch EC2 example config to use AWS deep learning AMI + latest Ray wheel #1331

Merged
merged 2 commits into from
Dec 17, 2017

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Dec 16, 2017

What do these changes do?

This is the fix-up PR for #1311

raise Exception(
"No subnets found, try manually creating an instance in "
"your specified region to populate the list of subnets "
"and trying this again.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why the subnets list is empty until you do this; I ran into this problem when trying to use the script in a region I hadn't used before (us-west-2).

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2821/
Test PASSed.

@@ -58,15 +58,13 @@ file_mounts: {

# List of shell commands to run to initialize the head node.
head_init_commands:
- cd ~/ray; git remote add eric https://github.com/ericl/ray.git || true
- cd ~/ray; git fetch eric && git reset --hard e1e97b3
- sudo pip3 install -U https://s3-us-west-2.amazonaws.com/ray-wheels/f5ea44338eca392df3a868035df3901829cc2ca1/ray-0.3.0-cp35-cp35m-manylinux1_x86_64.whl
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work if we get rid of sudo and add in --user

@robertnishihara
Copy link
Collaborator

Thanks @ericl this looks good.

Separately, what is the best way for developers to use this? E.g., should developers make a new AMI with Ray installed via python setup.py develop and then change the config to use that AMI?

Also, for development, would it make sense to pre-populate a workers.txt file on the head node? And the start_worker.sh/stop_worker.sh/upgrade.sh scripts?

Copy link
Collaborator

@robertnishihara robertnishihara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment. Feel free to address it or merge anyway.

@ericl
Copy link
Contributor Author

ericl commented Dec 17, 2017

@robertnishihara made that change, it seems to work.

Separately, what is the best way for developers to use this? E.g., should developers make a new AMI with Ray installed via python setup.py develop and then change the config to use that AMI?

Yeah, I snapshotted a new AMI that had my repo pre-cloned.

Also, for development, would it make sense to pre-populate a workers.txt file on the head node? And the start_worker.sh/stop_worker.sh/upgrade.sh scripts?

What I've been doing is adding a git checkout <my_sha> in the init commands and then running ray create_or_update, which updates the nodes. When I need to force restart the nodes I add some dummy init command (e.g. "echo foo") that triggers a ray restart on all nodes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2823/
Test PASSed.

@robertnishihara
Copy link
Collaborator

It's got to be unrelated to this PR, but this is the first time I saw the error

======================================================================
FAIL: testCleanupOnDriverExitManyRedisShards (__main__.MonitorTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/monitor_test.py", line 88, in testCleanupOnDriverExitManyRedisShards
    self._testCleanupOnDriverExit(num_redis_shards=5)
  File "test/monitor_test.py", line 79, in _testCleanupOnDriverExit
    self.assertEqual((0, 1), StateSummary()[:2])
AssertionError: Tuples differ: (0, 1) != (4, 3)

First differing element 0:
0
4

- (0, 1)
+ (4, 3)

cc @concretevitamin

@robertnishihara
Copy link
Collaborator

Filed an issue at #1332.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants