Skip to content

Switch EC2 example config to use AWS deep learning AMI + latest Ray wheel#1331

Merged
robertnishihara merged 2 commits intoray-project:masterfrom
ericl:fix-up-pip
Dec 17, 2017
Merged

Switch EC2 example config to use AWS deep learning AMI + latest Ray wheel#1331
robertnishihara merged 2 commits intoray-project:masterfrom
ericl:fix-up-pip

Conversation

@ericl
Copy link
Copy Markdown
Contributor

@ericl ericl commented Dec 16, 2017

What do these changes do?

This is the fix-up PR for #1311

raise Exception(
"No subnets found, try manually creating an instance in "
"your specified region to populate the list of subnets "
"and trying this again.")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why the subnets list is empty until you do this; I ran into this problem when trying to use the script in a region I hadn't used before (us-west-2).

@AmplabJenkins
Copy link
Copy Markdown

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2821/
Test PASSed.

Comment thread python/ray/autoscaler/aws/example.yaml Outdated
head_init_commands:
- cd ~/ray; git remote add eric https://github.com/ericl/ray.git || true
- cd ~/ray; git fetch eric && git reset --hard e1e97b3
- sudo pip3 install -U https://s3-us-west-2.amazonaws.com/ray-wheels/f5ea44338eca392df3a868035df3901829cc2ca1/ray-0.3.0-cp35-cp35m-manylinux1_x86_64.whl
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work if we get rid of sudo and add in --user

@robertnishihara
Copy link
Copy Markdown
Collaborator

Thanks @ericl this looks good.

Separately, what is the best way for developers to use this? E.g., should developers make a new AMI with Ray installed via python setup.py develop and then change the config to use that AMI?

Also, for development, would it make sense to pre-populate a workers.txt file on the head node? And the start_worker.sh/stop_worker.sh/upgrade.sh scripts?

Copy link
Copy Markdown
Collaborator

@robertnishihara robertnishihara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment. Feel free to address it or merge anyway.

@ericl
Copy link
Copy Markdown
Contributor Author

ericl commented Dec 17, 2017

@robertnishihara made that change, it seems to work.

Separately, what is the best way for developers to use this? E.g., should developers make a new AMI with Ray installed via python setup.py develop and then change the config to use that AMI?

Yeah, I snapshotted a new AMI that had my repo pre-cloned.

Also, for development, would it make sense to pre-populate a workers.txt file on the head node? And the start_worker.sh/stop_worker.sh/upgrade.sh scripts?

What I've been doing is adding a git checkout <my_sha> in the init commands and then running ray create_or_update, which updates the nodes. When I need to force restart the nodes I add some dummy init command (e.g. "echo foo") that triggers a ray restart on all nodes.

@AmplabJenkins
Copy link
Copy Markdown

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2823/
Test PASSed.

@robertnishihara
Copy link
Copy Markdown
Collaborator

It's got to be unrelated to this PR, but this is the first time I saw the error

======================================================================
FAIL: testCleanupOnDriverExitManyRedisShards (__main__.MonitorTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/monitor_test.py", line 88, in testCleanupOnDriverExitManyRedisShards
    self._testCleanupOnDriverExit(num_redis_shards=5)
  File "test/monitor_test.py", line 79, in _testCleanupOnDriverExit
    self.assertEqual((0, 1), StateSummary()[:2])
AssertionError: Tuples differ: (0, 1) != (4, 3)

First differing element 0:
0
4

- (0, 1)
+ (4, 3)

cc @concretevitamin

@robertnishihara
Copy link
Copy Markdown
Collaborator

Filed an issue at #1332.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants