Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] Initial work on integrating hyperparameter search tool #1107

Merged
merged 17 commits into from
Oct 13, 2017

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Oct 11, 2017

This PR refactors the rllib train.py script to depend on a new ray.tune module, which implements efficient hyper-parameter search.

The overall usage of train.py remains roughly the same, though now it supports two modes:

  • Inline args:
    ./train.py --env=Pong-v0 --alg=PPO --num_trials=8 --stop '{"time_total_s": 3200}' --resources '{"cpu": 8, "gpu": 2}' --config '{"num_workers": 8, "sgd_num_iter": 10}'

  • File-based:
    ./train.py -f tune-pong.yaml

Both delegate scheduling of trials to the ray.tune TrialRunner class. Additionally, the file-based mode supports hyper-parameter tuning (currently just grid and random search).

Note that though ray.tune should be written to support generic training, right now it has some RL-specific notions like agents and envs, which we should try to remove.

cc @richardliaw @pcmoritz

@@ -11,7 +11,7 @@
import ray
from ray.rllib.a3c.runner import RunnerThread, process_rollout
from ray.rllib.a3c.envs import create_and_wrap
from ray.rllib.common import Agent, TrainingResult, get_tensorflow_log_dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intended?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, seems like Eric split local_log_dir and upload_dir, that's a better way to do it than what we had :)

return self.agent.train.remote()

def should_stop(self, result):
"""Whether the given result meets this trial's stopping criteria."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR, we only support "maximization" right now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the user can just invert their objective for now


self._trials = []
self._pending = {}
self._avail_resources = {'cpu': 0, 'gpu': 0}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dict format appears very often; it could make sense in readability and otherwise to make this into a named tuple "Resources"...

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2085/
Test FAILed.

timesteps_this_iter=10, info={})


def get_agent_class(alg):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah! Good call on adding this function 👍

Copy link
Contributor

@pcmoritz pcmoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Can you add a little bit of documentation on how to use it?

def train_remote(self):
"""Returns Ray future for one iteration of training."""

assert self.status == Trial.RUNNING, self.status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second status is printed out if the first expression is false?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see

values, or `eval: <str>` for values to be sampled from the given Python
expression.

See ray/rllib/tuned_examples for more examples of configs in YAML form.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we will easily lose track of this line (especially as we move the example code around), might make sense to move this to a README

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

along with the example yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added README

agent_cls = get_agent_class(self.alg)
cls = ray.remote(num_gpus=self.resources.get('gpu', 0))(agent_cls)
self.agent = cls.remote(
self.env_creator, self.config, self.local_dir,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(don't know where to leave this) but just pointing out this does not support S3 directories.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

help="Number of trials to evaluate.")
parser.add_argument("--local_dir", default="/tmp/ray", type=str,
help="Local dir to save training results to.")
parser.add_argument("--upload_dir", default=None, type=str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor Author

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

help="Number of trials to evaluate.")
parser.add_argument("--local_dir", default="/tmp/ray", type=str,
help="Local dir to save training results to.")
parser.add_argument("--upload_dir", default=None, type=str,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

values, or `eval: <str>` for values to be sampled from the given Python
expression.

See ray/rllib/tuned_examples for more examples of configs in YAML form.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added README

agent_cls = get_agent_class(self.alg)
cls = ray.remote(num_gpus=self.resources.get('gpu', 0))(agent_cls)
self.agent = cls.remote(
self.env_creator, self.config, self.local_dir,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

def train_remote(self):
"""Returns Ray future for one iteration of training."""

assert self.status == Trial.RUNNING, self.status
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second status is printed out if the first expression is false?

return self.agent.train.remote()

def should_stop(self, result):
"""Whether the given result meets this trial's stopping criteria."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the user can just invert their objective for now

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2101/
Test FAILed.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2102/
Test FAILed.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2103/
Test FAILed.

@robertnishihara
Copy link
Collaborator

I'm seeing some errors in Jenkins, not sure if this is the "main" error, but I see, e.g.,

    alg1 = cls("CartPole-v0", config)
  File "/opt/conda/lib/python2.7/site-packages/ray-0.2.1-py2.7-linux-x86_64.egg/ray/rllib/common.py", line 105, in __init__
    "all agent configs: {}".format(k, self.config.keys()))
Exception: Unknown agent config `episodes_per_batch`, all agent configs: ['exploration_fraction', 'hiddens', 'prioritized_replay_beta0', 'schedule_max_timesteps', 'num_workers', 'double_q', 'target_network_update_freq', 'prioritized_replay_eps', 'timesteps_per_iteration', 'sample_batch_size', 'buffer_size', 'model', 'prioritized_replay_alpha', 'dueling', 'grad_norm_clipping', 'num_cpu', 'print_freq', 'train_batch_size', 'lr', 'learning_starts', 'exploration_final_eps', 'gpu_offset', 'prioritized_replay_beta_iters', 'gamma', 'prioritized_replay']
Disconnecting client on fd 11

@@ -0,0 +1,200 @@
from __future__ import absolute_import
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we aren't running this test anywhere. Should we run it in Travis (that would require pip installing gym in Travis)? or Jenkins?

We may want to go with Jenkins since we'll be able to run longer jobs (it currently lacks Python 3 testing, but we can build a docker container with Python 3 and potentially do both in Jenkins).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test only takes 45s since we use mocked agents (and it shouldn't depend on gym).

How do I enable it in travis?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically need to add it somwhere around here

ray/.travis.yml

Line 122 in 3764f2f

- python -m pytest python/ray/rllib/test/test_catalog.py

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2105/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2112/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2114/
Test PASSed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants