-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sgd] Replaced class Resources in sgd with use_gpu
#5252
[sgd] Replaced class Resources in sgd with use_gpu
#5252
Conversation
Test FAILed. |
@@ -65,8 +66,7 @@ def __init__(self, | |||
self.optimizer_timer = utils.TimerStat(window_size=1) | |||
|
|||
if resources_per_replica is None: | |||
resources_per_replica = utils.Resources( | |||
num_cpus=1, num_gpus=0, resources={}) | |||
resources_per_replica = Resources(cpu=1, gpu=0) | |||
|
|||
if backend == "auto": | |||
backend = "nccl" if resources_per_replica.num_gpus > 0 else "gloo" |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everywhere using num_gpus
-> gpu
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int(use_gpu)
@@ -3,6 +3,6 @@ | |||
from __future__ import print_function | |||
|
|||
from ray.experimental.sgd.pytorch.pytorch_trainer import PyTorchTrainer | |||
from ray.experimental.sgd.pytorch.utils import Resources | |||
from ray.tune.trial import Resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move Resources
into a new file ray.tune.resources
OK actually I got it; this change is fine so far. Let's make the SGD API simpler in this PR, and I think the work you've done so far is in progress towards that goal. Can you implement the following: class PyTorchTrainer(object):
...
def __init__(self,
model_creator,
data_creator,
optimizer_creator=utils.sgd_mse_optimizer,
config=None,
num_replicas=1,
use_gpu=False, # Make this easier
batch_size=16,
backend="auto"):
@classmethod
def default_resource_request(cls, resource):
# currently wrong syntax
return Resources(
cpu=0,
gpu=0,
extra_cpu=cf["num_workers"],
extra_gpu=int(use_gpu)* cf["num_workers"]) |
In order to apply 'use_gpu' and 'default_resource_request' I think I need to know what extra_gpu , extra_cpus mean in class Resources. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
python/ray/tune/trial.py
Outdated
@@ -300,7 +162,7 @@ def __init__(self, | |||
if resources: | |||
raise ValueError( | |||
"Resources for {} have been automatically set to {} " | |||
"by its `default_resource_request()` method. Please " | |||
"by its `default_resource_request()` method. Please " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(revert?)
This would correspond to
Yes, I think that's ok for now. |
Can we try to merge this PR only with changing 'Resources' in experimental.sgd and moving Resources to new file ray.tune.resources first, and work on the default_resource_request, and int(use_gpu) with pytorch trainable class with seperate PR? |
Test FAILed. |
Test FAILed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
use_gpu
Test PASSed. |
What do these changes do?
Removes duplicate class Resources in python/ray/experimental/sgd/pytorch/utils.py and replace with class Resources in /python/ray/tune/trial.py.
Related issue number
Linter
scripts/format.sh
to lint the changes in this PR.