[autoscaler] Experimental support for local / on-prem clusters #2678

ericl · 2018-08-18T00:18:04Z

What do these changes do?

This adds some experimental (undocumented) support for launching Ray on existing nodes. You have to provide the head ip, and the list of worker ips.

There are also a couple additional utils added for rsyncing files and port-forward.

ericl · 2018-08-18T00:40:58Z

cc @hartikainen

AmplabJenkins · 2018-08-18T01:46:51Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7575/
Test PASSed.

AmplabJenkins · 2018-08-18T01:53:35Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7576/
Test PASSed.

AmplabJenkins · 2018-08-18T05:14:36Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7582/
Test PASSed.

richardliaw

Overall looks good, would be nice to have some documentation.

Also are there going to be tests for this?

richardliaw · 2018-08-19T02:15:01Z

python/ray/scripts/scripts.py

+    required=False,
+    type=str,
+    help=("Override the configured cluster name."))
+def rsync_down(cluster_config_file, source, target, cluster_name):


Can you add some descriptors for these functions?

That's not currently done in this file. Do those show up in ray --help?

richardliaw · 2018-08-19T02:19:32Z

doc/source/autoscaling.rst

@@ -89,11 +89,21 @@ You can use ``ray attach`` to attach to an interactive console on the cluster.
 Port-forwarding applications
 ----------------------------

-To run connect to applications running on the cluster (e.g. Jupyter notebook) using a web browser, you can forward the port to your local machine using SSH:
+To run connect to applications running on the cluster (e.g. Jupyter notebook) using a web browser, you can use the port-forward option for ``ray exec``:


Can you leave a note that the port opened on the local machine is the same as the port forwarded on the remote machine?

richardliaw · 2018-08-19T02:21:28Z

python/ray/autoscaler/local/example-full.yaml

+head_setup_commands: []
+worker_setup_commands: []
+setup_commands:
+    - source activate ray && test -e ray || git clone https://github.com/YOUR_GITHUB/ray.git


why not just a pip install?

These setup commands don't seem to actually install ray, do they?

This example is for dev only for now...

richardliaw · 2018-08-19T02:31:42Z

python/ray/autoscaler/local/node_provider.py

+from ray.autoscaler.tags import TAG_RAY_NODE_TYPE
+
+
+class ClusterState(object):


Can you provide some docstrings for this class?

AmplabJenkins · 2018-08-19T05:52:41Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7590/
Test PASSed.

robertnishihara · 2018-08-19T06:39:06Z

cc @devin-petersohn @pschafhalter

robertnishihara · 2018-08-19T06:42:09Z

python/ray/autoscaler/local/node_provider.py

+                workers = json.loads(open(self.save_path).read())
+            else:
+                workers = {}
+            print("Loaded cluster state", workers)


Should we be using the logging module for new print statements?

we can do that separately in a PR (#2628)

robertnishihara · 2018-08-19T06:43:14Z

doc/source/autoscaling.rst

+Manually synchronizing files
+----------------------------
+
+To download or upload files to the cluster head node, use ``ray rsync_down`` or ``ray rsync_up``:


Is syncing to/from the head node the primary use case, as opposed to syncing to/from all nodes?

Yeah, handling other nodes is out of scope here.

robertnishihara · 2018-08-19T06:44:08Z

@ericl does this require any special instructions for what IP addresses to use when you have a public/private IP address distinction (like on EC2)?

ericl · 2018-08-19T19:41:53Z

@robertnishihara I'm assuming all nodes have just one IP for now.

ericl · 2018-08-19T19:43:07Z

Merging so we can start testing.

wip

3656129

ericl assigned richardliaw Aug 18, 2018

update

21e772d

richardliaw mentioned this pull request Aug 18, 2018

[autoscaler] Add sync command #2010

Closed

1 task

flake

96b6b57

richardliaw approved these changes Aug 19, 2018

View reviewed changes

fix

08f57e6

robertnishihara reviewed Aug 19, 2018

View reviewed changes

doc fix

0f3e462

ericl merged commit 9473da6 into ray-project:master Aug 19, 2018

pschafhalter mentioned this pull request Aug 28, 2018

Run Modin on cluster modin-project/modin#46

Closed

1 task

richardliaw mentioned this pull request Sep 25, 2018

[autoscaler] Support Sync-only Command #1917

Closed

		from ray.autoscaler.tags import TAG_RAY_NODE_TYPE


		class ClusterState(object):

[autoscaler] Experimental support for local / on-prem clusters #2678

[autoscaler] Experimental support for local / on-prem clusters #2678

Uh oh!

Conversation

ericl commented Aug 18, 2018

What do these changes do?

Uh oh!

ericl commented Aug 18, 2018

Uh oh!

AmplabJenkins commented Aug 18, 2018

Uh oh!

AmplabJenkins commented Aug 18, 2018

Uh oh!

AmplabJenkins commented Aug 18, 2018

Uh oh!

richardliaw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Aug 19, 2018

Uh oh!

robertnishihara commented Aug 19, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertnishihara commented Aug 19, 2018

Uh oh!

ericl commented Aug 19, 2018

Uh oh!

ericl commented Aug 19, 2018

Uh oh!

Uh oh!