-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rllib] Initial work on integrating hyperparameter search tool #1107
Conversation
@@ -11,7 +11,7 @@ | |||
import ray | |||
from ray.rllib.a3c.runner import RunnerThread, process_rollout | |||
from ray.rllib.a3c.envs import create_and_wrap | |||
from ray.rllib.common import Agent, TrainingResult, get_tensorflow_log_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, seems like Eric split local_log_dir and upload_dir, that's a better way to do it than what we had :)
return self.agent.train.remote() | ||
|
||
def should_stop(self, result): | ||
"""Whether the given result meets this trial's stopping criteria.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR, we only support "maximization" right now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the user can just invert their objective for now
python/ray/tune/trial_runner.py
Outdated
|
||
self._trials = [] | ||
self._pending = {} | ||
self._avail_resources = {'cpu': 0, 'gpu': 0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dict format appears very often; it could make sense in readability and otherwise to make this into a named tuple "Resources"...
Merged build finished. Test FAILed. |
Test FAILed. |
timesteps_this_iter=10, info={}) | ||
|
||
|
||
def get_agent_class(alg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah! Good call on adding this function 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Can you add a little bit of documentation on how to use it?
def train_remote(self): | ||
"""Returns Ray future for one iteration of training.""" | ||
|
||
assert self.status == Trial.RUNNING, self.status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second status is printed out if the first expression is false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see
python/ray/tune/config_parser.py
Outdated
values, or `eval: <str>` for values to be sampled from the given Python | ||
expression. | ||
|
||
See ray/rllib/tuned_examples for more examples of configs in YAML form. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we will easily lose track of this line (especially as we move the example code around), might make sense to move this to a README
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
along with the example yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added README
python/ray/tune/trial.py
Outdated
agent_cls = get_agent_class(self.alg) | ||
cls = ray.remote(num_gpus=self.resources.get('gpu', 0))(agent_cls) | ||
self.agent = cls.remote( | ||
self.env_creator, self.config, self.local_dir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(don't know where to leave this) but just pointing out this does not support S3 directories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
help="Number of trials to evaluate.") | ||
parser.add_argument("--local_dir", default="/tmp/ray", type=str, | ||
help="Local dir to save training results to.") | ||
parser.add_argument("--upload_dir", default=None, type=str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
help="Number of trials to evaluate.") | ||
parser.add_argument("--local_dir", default="/tmp/ray", type=str, | ||
help="Local dir to save training results to.") | ||
parser.add_argument("--upload_dir", default=None, type=str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
python/ray/tune/config_parser.py
Outdated
values, or `eval: <str>` for values to be sampled from the given Python | ||
expression. | ||
|
||
See ray/rllib/tuned_examples for more examples of configs in YAML form. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added README
python/ray/tune/trial.py
Outdated
agent_cls = get_agent_class(self.alg) | ||
cls = ray.remote(num_gpus=self.resources.get('gpu', 0))(agent_cls) | ||
self.agent = cls.remote( | ||
self.env_creator, self.config, self.local_dir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
def train_remote(self): | ||
"""Returns Ray future for one iteration of training.""" | ||
|
||
assert self.status == Trial.RUNNING, self.status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second status is printed out if the first expression is false?
return self.agent.train.remote() | ||
|
||
def should_stop(self, result): | ||
"""Whether the given result meets this trial's stopping criteria.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the user can just invert their objective for now
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test FAILed. |
Test FAILed. |
I'm seeing some errors in Jenkins, not sure if this is the "main" error, but I see, e.g.,
|
@@ -0,0 +1,200 @@ | |||
from __future__ import absolute_import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we aren't running this test anywhere. Should we run it in Travis (that would require pip installing gym in Travis)? or Jenkins?
We may want to go with Jenkins since we'll be able to run longer jobs (it currently lacks Python 3 testing, but we can build a docker container with Python 3 and potentially do both in Jenkins).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test only takes 45s since we use mocked agents (and it shouldn't depend on gym).
How do I enable it in travis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically need to add it somwhere around here
Line 122 in 3764f2f
- python -m pytest python/ray/rllib/test/test_catalog.py |
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
This PR refactors the rllib train.py script to depend on a new
ray.tune
module, which implements efficient hyper-parameter search.The overall usage of
train.py
remains roughly the same, though now it supports two modes:Inline args:
./train.py --env=Pong-v0 --alg=PPO --num_trials=8 --stop '{"time_total_s": 3200}' --resources '{"cpu": 8, "gpu": 2}' --config '{"num_workers": 8, "sgd_num_iter": 10}'
File-based:
./train.py -f tune-pong.yaml
Both delegate scheduling of trials to the
ray.tune
TrialRunner class. Additionally, the file-based mode supports hyper-parameter tuning (currently just grid and random search).Note that though
ray.tune
should be written to support generic training, right now it has some RL-specific notions like agents and envs, which we should try to remove.cc @richardliaw @pcmoritz