-
Notifications
You must be signed in to change notification settings - Fork 6.2k
Adding basic support for a user-interpretable resource label #761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test PASSed. |
Test PASSed. |
Want to add some tests showing how to use this? |
yeah, I'll add some unit tests to exercise this. I've been testing it like this:
|
ae9a294
to
17a805e
Compare
Merged build finished. Test PASSed. |
Test PASSed. |
…rized resource accounting
17a805e
to
cc96ecc
Compare
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
python/ray/worker.py
Outdated
@@ -1296,6 +1305,9 @@ def init(redis_address=None, node_ip_address=None, object_id_seed=None, | |||
be configured with. | |||
num_gpus (int): Number of gpus the user wishes all local schedulers to | |||
be configured with. | |||
num_custom_resource (int): The quantity of a user-defined custom | |||
resource that the local scheduler should be configured with. This | |||
flag is highly unstable and should not be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of unstable, let's say "support for this will be removed" or "experimental"; unstable has more a connotation of not working reliably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"will be removed" sounds like that it has been deprecated. I hope it will not just be removed, but maybe matured into something else.
Maybe just saying "experimental feature subject to changes in the future". Kubernetes has this for GPU support: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, you are right, sounds good!
BTW, your GitHub tagline is "An experimental distributed execution engine". So this is then experimental experimental feature? ;-) Maybe you should find a better tagline. Like "an awesome distributed execution engine". :-) |
Merged build finished. Test PASSed. |
Test PASSed. |
Thanks! |
This can be used as follows (for example). Start three machines like this.
Define a remote function that uses some of the "custom resource" import ray
ray.init(redis_address="172.31.10.143:6379")
@ray.remote(num_custom_resource=1)
def f():
import time
time.sleep(0.01)
return ray.services.get_node_ip_address()
print(set(ray.get([f.remote() for _ in range(1000)]))) The print statement should show that it only is scheduled on the second machine. Note that to start a machine with infinite "custom resource", you can use |
This PR provides the ability to configure an arbitrary resource per local scheduler and lets tasks request it. It natively supports infinite capacity out of the box.
This is an experimental first pass at addressing #695. There will be API changes down the road.