Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib; docs] Docs do-over (new API stack): Remove special "new API stack" page (move some of its content to migration guide). #49713

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .vale/styles/config/vocabularies/RLlib/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ MARLModule
(MARWIL|marwil)
MLAgents
multiagent
[Pp]erceptrons?
postprocessing
(PPO|ppo)
[Pp]y[Tt]orch
Expand Down
2 changes: 1 addition & 1 deletion doc/source/_includes/rllib/new_api_stack.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. note::

Ray 2.40 uses :doc:`RLlib's new API stack </rllib/rllib-new-api-stack>` by default.
Ray 2.40 uses RLlib's new API stack by default.
The Ray team has mostly completed transitioning algorithms, example scripts, and
documentation to the new code base.

Expand Down
2 changes: 0 additions & 2 deletions doc/source/rllib/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ RLlib: Industry-Grade, Scalable Reinforcement Learning
rllib-learner
env-runners
rllib-examples
rllib-new-api-stack <- remove?
new-api-stack-migration-guide
package_ref/index

Expand All @@ -55,7 +54,6 @@ RLlib: Industry-Grade, Scalable Reinforcement Learning
rllib-algorithms
user-guides
rllib-examples
rllib-new-api-stack
new-api-stack-migration-guide
package_ref/index

Expand Down
2 changes: 1 addition & 1 deletion doc/source/rllib/key-concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ AlgorithmConfig and Algorithm
.. tip::
The following is a quick overview of **RLlib AlgorithmConfigs and Algorithms**.
See :ref:`here for a detailed description of the Algorithm class <rllib-algorithms-doc>`.
See here for a :ref:`detailed description of the Algorithm class <rllib-algorithms-doc>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
See here for a :ref:`detailed description of the Algorithm class <rllib-algorithms-doc>`.
See :ref:`<rllib-algorithms-doc>` for a detailed description of the Algorithm class.


The RLlib :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` class serves as a runtime for your RL experiments,
bringing together all components required for learning an optimal solution to your :ref:`RL environment <rllib-key-concepts-environments>`.
Expand Down
80 changes: 52 additions & 28 deletions doc/source/rllib/new-api-stack-migration-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@

.. _rllib-new-api-stack-migration-guide:


.. testcode::
:hide:

Expand All @@ -18,15 +17,43 @@ New API stack migration guide

This page explains, step by step, how to convert and translate your existing old API stack
RLlib classes and code to RLlib's new API stack.
:ref:`Why you should migrate to the new API stack <rllib-new-api-stack-guide>`.


What's the new API stack?
--------------------------

The new API stack is the result of re-writing the core RLlib APIs from scratch and reducing
user-facing classes from more than a dozen critical ones down to only a handful
of classes, without any loss of features. When designing these new interfaces,
the Ray Team strictly applied the following principles:

* Classes must be usable outside of RLlib.
* Separation of concerns. Try to answer: "**What** should get done **when** and **by whom**?"
and give each class as few non-overlapping and clearly defined tasks as possible.
* Offer fine-grained modularity, full interoperability, and frictionless pluggability of classes.
* Use widely accepted third-party standards and APIs wherever possible.

Applying the preceding principles, the Ray Team reduced the important **must-know** classes
for the average RLlib user from eight on the old stack, to only five on the new stack.
The **core** new API stack classes are:

* :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`, which replaces ``ModelV2`` and ``PolicyMap`` APIs
* :py:class:`~ray.rllib.core.learner.learner.Learner`, which replaces ``RolloutWorker`` and some of ``Policy``
* :py:class:`~ray.rllib.env.single_agent_episode.SingleAgentEpisode` and :py:class:`~ray.rllib.env.multi_agent_episode.MultiAgentEpisode`, which replace ``ViewRequirement``, ``SampleCollector``, ``Episode``, and ``EpisodeV2``
* :py:class:`~ray.rllib.connector.connector_v2.ConnectorV2`, which replaces ``Connector`` and some of ``RolloutWorker`` and ``Policy``

The :py:class:`~ray.rllib.algorithm.algorithm_config.AlgorithmConfig` and
:py:class:`~ray.rllib.algorithm.algorithm.Algorithm` APIs remain as-is.
These classes are already established APIs on the old stack.


.. note::

Even though the new API stack still provides rudimentary support for `TensorFlow <https://tensorflow.org>`__,
RLlib supports a single deep learning framework, the `PyTorch <https://pytorch.org>`__
framework, dropping TensorFlow support entirely.
Note, though, that the Ray team continues to design RLlib to be framework-agnostic.
Note, though, that the Ray team continues to design RLlib to be framework-agnostic
and may add support for additional frameworks in the future.


Check your AlgorithmConfig
Expand Down Expand Up @@ -76,7 +103,7 @@ The new API stack deprecates the following framework-related settings:
AlgorithmConfig.resources()
~~~~~~~~~~~~~~~~~~~~~~~~~~~

The `num_gpus` and `_fake_gpus` settings have been deprecated. To place your
The Ray team deprecated the ``num_gpus`` and ``_fake_gpus`` settings. To place your
RLModule on one or more GPUs on the Learner side, do the following:

.. testcode::
Expand All @@ -91,8 +118,8 @@ RLModule on one or more GPUs on the Learner side, do the following:

The `num_learners` setting determines how many remote :py:class:`~ray.rllib.core.learner.learner.Learner`
workers there are in your Algorithm's :py:class:`~ray.rllib.core.learner.learner_group.LearnerGroup`.
If you set this to 0, your LearnerGroup only contains a **local** Learner that runs on the main
process (and shares the compute resources with that process, usually 1 CPU).
If you set this parameter to ``0``, your LearnerGroup only contains a **local** Learner that runs on the main
process and shares its compute resources, typically 1 CPU.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add here that for Offline RL this setting should be never 1?

For asynchronous algorithms like IMPALA or APPO, this setting should therefore always be >0.

`See here for an example on how to train with fractional GPUs <https://github.com/ray-project/ray/blob/master/rllib/examples/gpus/fractional_gpus_per_learner.py>`__.
Expand All @@ -109,7 +136,7 @@ If GPUs aren't available, but you want to learn with more than one
num_gpus_per_learner=0, # <- default
)

The setting `num_cpus_for_local_worker` has been renamed to `num_cpus_for_main_process`.
the Ray team renamed the setting ``num_cpus_for_local_worker`` to ``num_cpus_for_main_process``.

.. testcode::

Expand All @@ -122,11 +149,10 @@ AlgorithmConfig.training()
Train batch size
................

Due to the new API stack's :py:class:`~ray.rllib.core.learner.learner.Learner` worker
architecture, training may be distributed over n
:py:class:`~ray.rllib.core.learner.learner.Learner` workers, so RLlib provides the train batch size
per individual :py:class:`~ray.rllib.core.learner.learner.Learner`.
You should no longer use the `train_batch_size` setting:
Due to the new API stack's :py:class:`~ray.rllib.core.learner.learner.Learner` worker architecture,
training may happen in distributed fashion over ``n`` :py:class:`~ray.rllib.core.learner.learner.Learner` workers,
so RLlib provides the train batch size per individual :py:class:`~ray.rllib.core.learner.learner.Learner`.
Don't use the ``train_batch_size`` setting any longer:


.. testcode::
Expand Down Expand Up @@ -215,7 +241,7 @@ It allows you to specify:
#. the number of `Learner` workers through `.learners(num_learners=...)`.
#. the resources per learner; use `.learners(num_gpus_per_learner=1)` for GPU training
and `.learners(num_gpus_per_learner=0)` for CPU training.
#. the custom Learner class you want to use (`example on how to do this here <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__)
#. the custom Learner class you want to use. See this `example <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__ for more details.
#. a config dict you would like to set for your custom learner:
`.learners(learner_config_dict={...})`. Note that every `Learner` has access to the
entire `AlgorithmConfig` object through `self.config`, but setting the
Expand Down Expand Up @@ -295,7 +321,7 @@ or :py:meth:`~ray.rllib.core.rl_module.rl_module.RLModule._forward_inference`, i
config.env_runners(explore=True) # <- or False


The `exploration_config` setting is deprecated and no longer used. Instead, determine the exact exploratory
The Ray team has deprecated the ``exploration_config`` setting. Instead, define the exact exploratory
behavior, for example, sample an action from a distribution, inside the overridden
:py:meth:`~ray.rllib.core.rl_module.rl_module.RLModule._forward_exploration` method of your
:py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`.
Expand All @@ -305,16 +331,15 @@ Custom callbacks
----------------

If you're using custom callbacks on the old API stack, you're subclassing the ``DefaultCallbacks`` class,
which has been renamed to :py:class`~ray.rllib.callbacks.callbacks.RLlibCallback`.
which the Ray team renamed to :py:class`~ray.rllib.callbacks.callbacks.RLlibCallback`.
You can continue this approach with the new API stack and pass your custom subclass to your config like the following:

.. testcode::

# config.callbacks(YourCallbacksClass)

However, if you're overriding those methods that triggered on the :py:class:`~ray.rllib.env.env_runner.EnvRunner`
side, for example, ``on_episode_start/stop/step/etc...``, a small amount of translation may be required, because
the arguments that RLlib passes to many of these methods have slightly changed.
side, for example, ``on_episode_start/stop/step/etc...``, you may have to translate some call arguments.

The following is a one-to-one translation guide for these types of :py:class`~ray.rllib.callbacks.callbacks.RLlibCallback`
methods:
Expand Down Expand Up @@ -370,17 +395,16 @@ methods:
# on_episode_step()
# on_episode_end()


The following callback methods are no longer available on the new API stack:

**`on_sub_environment_created()`**: The new API stack uses `Farama's gymnasium <https://farama.org>`__ vector Envs leaving no control for RLlib
to call a callback on each individual env-index's creation.

**`on_create_policy()`**: This method is no longer available on the new API stack because only ``RolloutWorker`` calls it.
* ``on_sub_environment_created()``: The new API stack uses `Farama's gymnasium <https://farama.org>`__ vector Envs leaving no control for RLlib
to call a callback on each individual env-index's creation.
* ``on_create_policy()``: This method is no longer available on the new API stack because only ``RolloutWorker`` calls it.
* ``on_postprocess_trajectory()``: The new API stack no longer triggers and calls this method
because :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` pipelines handle trajectory processing entirely.
The documentation for :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` is under development.

**`on_postprocess_trajectory()`**: The new API stack no longer triggers and calls this method,
because :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` pipelines handle trajectory processing entirely.
The documention for :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` documentation is under development.
See :ref:`<rllib-callback-docs>` for a detailed description of RLlib callback APIs.


.. _rllib-modelv2-to-rlmodule:
Expand Down Expand Up @@ -492,7 +516,7 @@ Policy.compute_log_likelihoods
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Implement your custom RLModule's :py:meth:`~ray.rllib.core.rl_module.rl_module.RLModule._forward_train` method and
return the `Columns.ACTION_LOGP` key together with the corresponding action log probs in order to pass this information
return the ``Columns.ACTION_LOGP`` key together with the corresponding action log probabilities to pass this information
to your loss functions, which your code calls after `forward_train()`. The loss logic can then access
`Columns.ACTION_LOGP`.

Expand Down Expand Up @@ -522,8 +546,8 @@ It also provides superior scalability, allowing training in a multi-GPU setup in
and multi-node with multi-GPU training on the `Anyscale <https://anyscale.com>`__ platform.


Custom connectors (old-stack)
-----------------------------
Custom connectors
-----------------

If you're using custom connectors from the old API stack, move your logic into the
new :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` API.
Expand Down
8 changes: 0 additions & 8 deletions doc/source/rllib/rllib-dev.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,14 +70,6 @@ New feature developments, discussions, and upcoming priorities are tracked on th
API Stability
=============

New API stack vs Old API stack
------------------------------

Starting in Ray 2.10, you can opt-in to the alpha version of a "new API stack", a fundamental overhaul from the ground up with respect to architecture,
design principles, code base, and user facing APIs.

:ref:`See here for more details <rllib-new-api-stack-guide>` on this effort and how to activate the new API stack through your config.


API Decorators in the Codebase
------------------------------
Expand Down
150 changes: 0 additions & 150 deletions doc/source/rllib/rllib-new-api-stack.rst

This file was deleted.

Loading
Loading