-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[autoscaler] Experimental support for local / on-prem clusters #2678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @hartikainen |
Test PASSed. |
Test PASSed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, would be nice to have some documentation.
Also are there going to be tests for this?
required=False, | ||
type=str, | ||
help=("Override the configured cluster name.")) | ||
def rsync_down(cluster_config_file, source, target, cluster_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some descriptors for these functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not currently done in this file. Do those show up in ray --help?
doc/source/autoscaling.rst
Outdated
@@ -89,11 +89,21 @@ You can use ``ray attach`` to attach to an interactive console on the cluster. | |||
Port-forwarding applications | |||
---------------------------- | |||
|
|||
To run connect to applications running on the cluster (e.g. Jupyter notebook) using a web browser, you can forward the port to your local machine using SSH: | |||
To run connect to applications running on the cluster (e.g. Jupyter notebook) using a web browser, you can use the port-forward option for ``ray exec``: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave a note that the port opened on the local machine is the same as the port forwarded on the remote machine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
head_setup_commands: [] | ||
worker_setup_commands: [] | ||
setup_commands: | ||
- source activate ray && test -e ray || git clone https://github.com/YOUR_GITHUB/ray.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just a pip install?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These setup commands don't seem to actually install ray, do they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is for dev only for now...
from ray.autoscaler.tags import TAG_RAY_NODE_TYPE | ||
|
||
|
||
class ClusterState(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide some docstrings for this class?
Test PASSed. |
workers = json.loads(open(self.save_path).read()) | ||
else: | ||
workers = {} | ||
print("Loaded cluster state", workers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be using the logging module for new print statements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can do that separately in a PR (#2628)
Manually synchronizing files | ||
---------------------------- | ||
|
||
To download or upload files to the cluster head node, use ``ray rsync_down`` or ``ray rsync_up``: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is syncing to/from the head node the primary use case, as opposed to syncing to/from all nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, handling other nodes is out of scope here.
@ericl does this require any special instructions for what IP addresses to use when you have a public/private IP address distinction (like on EC2)? |
@robertnishihara I'm assuming all nodes have just one IP for now. |
Merging so we can start testing. |
What do these changes do?
This adds some experimental (undocumented) support for launching Ray on existing nodes. You have to provide the head ip, and the list of worker ips.
There are also a couple additional utils added for rsyncing files and port-forward.