Skip to content

Commit 4b8b703

Browse files
authored
[rllib] Some API cleanups and documentation improvements (#4409)
1 parent 59079a7 commit 4b8b703

26 files changed

+94
-62
lines changed

doc/source/rllib-examples.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ Training Workflows
2626
Custom Envs and Models
2727
----------------------
2828

29-
- `Registering a custom env <https://github.com/ray-project/ray/blob/master/python/ray/rllib/examples/custom_env.py>`__:
30-
Example of defining and registering a gym env for use with RLlib.
29+
- `Registering a custom env and model <https://github.com/ray-project/ray/blob/master/python/ray/rllib/examples/custom_env.py>`__:
30+
Example of defining and registering a gym env and model for use with RLlib.
3131
- `Registering a custom model with supervised loss <https://github.com/ray-project/ray/blob/master/python/ray/rllib/examples/custom_loss.py>`__:
3232
Example of defining and registering a custom model with a supervised loss.
3333
- `Subprocess environment <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tests/test_env_with_subprocess.py>`__:

doc/source/rllib-offline.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -157,8 +157,8 @@ You can configure experience input for an agent using the following options:
157157

158158
.. literalinclude:: ../../python/ray/rllib/agents/agent.py
159159
:language: python
160-
:start-after: __sphinx_doc_input_begin__
161-
:end-before: __sphinx_doc_input_end__
160+
:start-after: === Offline Datasets ===
161+
:end-before: Specify where experiences should be saved
162162

163163
The interface for a custom input reader is as follows:
164164

@@ -172,8 +172,8 @@ You can configure experience output for an agent using the following options:
172172

173173
.. literalinclude:: ../../python/ray/rllib/agents/agent.py
174174
:language: python
175-
:start-after: __sphinx_doc_output_begin__
176-
:end-before: __sphinx_doc_output_end__
175+
:start-after: shuffle_buffer_size
176+
:end-before: === Multiagent ===
177177

178178
The interface for a custom output writer is as follows:
179179

doc/source/rllib-training.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ The ``rllib train`` command (same as the ``train.py`` script in the repo) has a
3737
The most important options are for choosing the environment
3838
with ``--env`` (any OpenAI gym environment including ones registered by the user
3939
can be used) and for choosing the algorithm with ``--run``
40-
(available options are ``PPO``, ``PG``, ``A2C``, ``A3C``, ``IMPALA``, ``ES``, ``DDPG``, ``DQN``, ``APEX``, and ``APEX_DDPG``).
40+
(available options are ``PPO``, ``PG``, ``A2C``, ``A3C``, ``IMPALA``, ``ES``, ``DDPG``, ``DQN``, ``MARWIL``, ``APEX``, and ``APEX_DDPG``).
4141

4242
Evaluating Trained Agents
4343
~~~~~~~~~~~~~~~~~~~~~~~~~

doc/source/rllib.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
RLlib: Scalable Reinforcement Learning
22
======================================
33

4-
RLlib is an open-source library for reinforcement learning that offers both a collection of reference algorithms and scalable primitives for composing new ones.
4+
RLlib is an open-source library for reinforcement learning that offers both a unified API for a variety of applications and high scalability via distributed eager execution.
55

66
.. image:: rllib-stack.svg
77

python/ray/rllib/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
RLlib: Scalable Reinforcement Learning
22
======================================
33

4-
RLlib is an open-source library for reinforcement learning that offers both a collection of reference algorithms and scalable primitives for composing new ones.
4+
RLlib is an open-source library for reinforcement learning that offers both a unified API for a variety of applications and high scalability via distributed eager execution.
55

66
For an overview of RLlib, see the [documentation](http://ray.readthedocs.io/en/latest/rllib.html).
77

python/ray/rllib/agents/a3c/a2c.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,6 @@ class A2CAgent(A3CAgent):
2525

2626
@override(A3CAgent)
2727
def _make_optimizer(self):
28-
return SyncSamplesOptimizer(self.local_evaluator,
29-
self.remote_evaluators,
30-
self.config["optimizer"])
28+
return SyncSamplesOptimizer(
29+
self.local_evaluator, self.remote_evaluators,
30+
{"train_batch_size": self.config["train_batch_size"]})

python/ray/rllib/agents/a3c/a3c.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -69,8 +69,7 @@ def _train(self):
6969
start = time.time()
7070
while time.time() - start < self.config["min_iter_time_s"]:
7171
self.optimizer.step()
72-
result = self.optimizer.collect_metrics(
73-
self.config["collect_metrics_timeout"])
72+
result = self.collect_metrics()
7473
result.update(timesteps_this_iter=self.optimizer.num_steps_sampled -
7574
prev_steps)
7675
return result

python/ray/rllib/agents/a3c/a3c_tf_policy_graph.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -146,8 +146,8 @@ def postprocess_trajectory(self,
146146
self.config["lambda"])
147147

148148
@override(TFPolicyGraph)
149-
def gradients(self, optimizer):
150-
grads = tf.gradients(self._loss, self.var_list)
149+
def gradients(self, optimizer, loss):
150+
grads = tf.gradients(loss, self.var_list)
151151
self.grads, _ = tf.clip_by_global_norm(grads, self.config["grad_clip"])
152152
clipped_grads = list(zip(self.grads, self.var_list))
153153
return clipped_grads

python/ray/rllib/agents/agent.py

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,9 @@
9999
# === Execution ===
100100
# Number of environments to evaluate vectorwise per worker.
101101
"num_envs_per_worker": 1,
102-
# Default sample batch size
102+
# Default sample batch size (unroll length). Batches of this size are
103+
# collected from workers until train_batch_size is met. When using
104+
# multiple envs per worker, this is multiplied by num_envs_per_worker.
103105
"sample_batch_size": 200,
104106
# Training batch size, if applicable. Should be >= sample_batch_size.
105107
# Samples batches will be concatenated together to this size for training.
@@ -137,6 +139,8 @@
137139
"compress_observations": False,
138140
# Drop metric batches from unresponsive workers after this many seconds
139141
"collect_metrics_timeout": 180,
142+
# Smooth metrics over this many episodes.
143+
"metrics_smoothing_episodes": 100,
140144
# If using num_envs_per_worker > 1, whether to create those new envs in
141145
# remote processes instead of in the same worker. This adds overheads, but
142146
# can make sense if your envs are very CPU intensive (e.g., for StarCraft).
@@ -146,7 +150,6 @@
146150
"async_remote_worker_envs": False,
147151

148152
# === Offline Datasets ===
149-
# __sphinx_doc_input_begin__
150153
# Specify how to generate experiences:
151154
# - "sampler": generate experiences via online simulation (default)
152155
# - a local directory or file glob expression (e.g., "/tmp/*.json")
@@ -172,8 +175,6 @@
172175
# of this number of batches. Use this if the input data is not in random
173176
# enough order. Input is delayed until the shuffle buffer is filled.
174177
"shuffle_buffer_size": 0,
175-
# __sphinx_doc_input_end__
176-
# __sphinx_doc_output_begin__
177178
# Specify where experiences should be saved:
178179
# - None: don't save any experiences
179180
# - "logdir" to save to the agent log dir
@@ -184,7 +185,6 @@
184185
"output_compress_columns": ["obs", "new_obs"],
185186
# Max output file size before rolling over to a new file.
186187
"output_max_file_size": 64 * 1024 * 1024,
187-
# __sphinx_doc_output_end__
188188

189189
# === Multiagent ===
190190
"multiagent": {
@@ -559,6 +559,17 @@ def export_policy_checkpoint(self,
559559
self.local_evaluator.export_policy_checkpoint(
560560
export_dir, filename_prefix, policy_id)
561561

562+
@DeveloperAPI
563+
def collect_metrics(self, selected_evaluators=None):
564+
"""Collects metrics from the remote evaluators of this agent.
565+
566+
This is the same data as returned by a call to train().
567+
"""
568+
return self.optimizer.collect_metrics(
569+
self.config["collect_metrics_timeout"],
570+
min_history=self.config["metrics_smoothing_episodes"],
571+
selected_evaluators=selected_evaluators)
572+
562573
@classmethod
563574
def resource_help(cls, config):
564575
return ("\n\nYou can adjust the resource requests of RLlib agents by "

python/ray/rllib/agents/ddpg/apex.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@
2323
"learning_starts": 50000,
2424
"train_batch_size": 512,
2525
"sample_batch_size": 50,
26-
"max_weight_sync_delay": 400,
2726
"target_network_update_freq": 500000,
2827
"timesteps_per_iteration": 25000,
2928
"per_worker_exploration": True,

0 commit comments

Comments
 (0)