Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use RAY_ADDRESS to connect to an existing Ray cluster if present #7977

Merged
merged 8 commits into from
Apr 27, 2020

Conversation

pcmoritz
Copy link
Contributor

@pcmoritz pcmoritz commented Apr 11, 2020

Why are these changes needed?

We would like to allow people to run existing Ray applications on a cluster without changing their code. This means if they use a driver that is initialized with ray.init(), there should be a way to make it connect to an existing cluster instead of starting a new one. This can make it more robust to execute Ray applications, i.e. it can force connecting to an existing cluster and avoid the possible problem where somebody runs an application that starts a new local Ray instead of using the existing cluster.

This PR makes that possible by introducing an environment variable RAY_ADDRESS. If it is defined while the driver is run, it will be treated as the address parameter of ray.init, making it connect to the existing cluster.

Q & A:

  • Should we print a warning if the RAY_ADDRESS variable is set and we connect to an existing cluster?

In general it is not common practice to print a warning if an environment variable is picked up, but in this case there could be a case for it, especially since warnings can be turned off selectively. On the other hand, if RAY_ADDRESS is set, using the existing cluster should be the expected behaviour.

  • Should we have an option to force a local cluster even if RAY_ADDRESS is defined?

This seems like a rare use case and it is already supported by doing

del os.environ["RAY_ADDRESS"]
ray.init()

So I'd avoid complicating the API for this.

Related issue number

Checks

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24548/
Test PASSed.

@pcmoritz pcmoritz changed the title [WIP] Use RAY_ADDRESS to connect to an existing Ray cluster if present Use RAY_ADDRESS to connect to an existing Ray cluster if present Apr 26, 2020
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25206/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25204/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25207/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25208/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25210/
Test FAILed.

Copy link
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I don't think we need to print a warning if connecting. It seems uncommon for a user to want to start a local cluster when there's one already running on the machine anyways. If there isn't and RAY_ADDRESS is picked up, it will fail anyways.

@pcmoritz pcmoritz merged commit d7da25e into ray-project:master Apr 27, 2020
@pcmoritz pcmoritz deleted the ray-address-env branch April 27, 2020 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants