Skip to content

Adding basic support for a user-interpretable resource label #761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Aug 8, 2017

Conversation

atumanov
Copy link
Contributor

@atumanov atumanov commented Jul 20, 2017

This PR provides the ability to configure an arbitrary resource per local scheduler and lets tasks request it. It natively supports infinite capacity out of the box.

This is an experimental first pass at addressing #695. There will be API changes down the road.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1422/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1423/
Test FAILed.

@atumanov atumanov changed the title [WIP] Adding basic support for a user-interpretable resource label Adding basic support for a user-interpretable resource label Jul 26, 2017
@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1424/
Test PASSed.

@robertnishihara
Copy link
Collaborator

Want to add some tests showing how to use this?

@atumanov
Copy link
Contributor Author

yeah, I'll add some unit tests to exercise this. I've been testing it like this:

#!/usr/bin/env python
import ray
import time

@ray.remote(num_uirs=1)
def f():
  time.sleep(10)
  return 1

@ray.remote(num_uirs=1)
def g():
  return 2

ray.init(num_uirs=1)

oid = f.remote(); oids = [g.remote() for _ in range(10)]
t1 = time.time(); results = ray.get(oids); t2 = time.time()
print t2-t1, results

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1568/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1571/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1572/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1573/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1574/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1575/
Test PASSed.

@@ -1296,6 +1305,9 @@ def init(redis_address=None, node_ip_address=None, object_id_seed=None,
be configured with.
num_gpus (int): Number of gpus the user wishes all local schedulers to
be configured with.
num_custom_resource (int): The quantity of a user-defined custom
resource that the local scheduler should be configured with. This
flag is highly unstable and should not be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of unstable, let's say "support for this will be removed" or "experimental"; unstable has more a connotation of not working reliably

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"will be removed" sounds like that it has been deprecated. I hope it will not just be removed, but maybe matured into something else.

Maybe just saying "experimental feature subject to changes in the future". Kubernetes has this for GPU support: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, you are right, sounds good!

@mitar
Copy link
Member

mitar commented Aug 8, 2017

BTW, your GitHub tagline is "An experimental distributed execution engine". So this is then experimental experimental feature? ;-)

Maybe you should find a better tagline. Like "an awesome distributed execution engine". :-)

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1577/
Test PASSed.

@pcmoritz pcmoritz merged commit fc885bd into ray-project:master Aug 8, 2017
@pcmoritz pcmoritz deleted the uirlabel branch August 8, 2017 09:54
@mitar
Copy link
Member

mitar commented Aug 8, 2017

Thanks!

@robertnishihara
Copy link
Collaborator

This can be used as follows (for example).

Start three machines like this.

ray start --head --redis-port=6379
ray start --redis-address 172.31.10.143:6379 --num-custom-resource=10
ray start --redis-address 172.31.10.143:6379

Define a remote function that uses some of the "custom resource"

import ray

ray.init(redis_address="172.31.10.143:6379")

@ray.remote(num_custom_resource=1)
def f():
    import time
    time.sleep(0.01)
    return ray.services.get_node_ip_address()

print(set(ray.get([f.remote() for _ in range(1000)])))

The print statement should show that it only is scheduled on the second machine. Note that to start a machine with infinite "custom resource", you can use --num-custom-resource=-1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants