-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Turn doc tests into '.. doctest::' #37492
[RLlib] Turn doc tests into '.. doctest::' #37492
Conversation
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
MyCustomTraceCallbacks, | ||
.... | ||
])) | ||
The resulting DefaultCallbacks will call all the sub-callbacks' callbacks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just keep AlgorithConfig out of the picture.
The example we would have to construct to make this a copy/paste thing would be quite large.
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
"agent_2": np.array(...), | ||
} | ||
) | ||
.. testcode:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a newline after this or the example won't render.
} | ||
) | ||
.. testcode:: | ||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than having one large testcode, I'd recommend splitting this into multiple testcode
for better readability:
Represent a list of agent data from one env step() call.
.. testcode::
import numpy as np
ac = AgentConnectorDataType(
env_id="env_1",
agent_id=None,
data={
"agent_1": np.array([1, 2, 3]),
"agent_2": np.array([4, 5, 6]),
}
)
Or a single agent data ready to be preprocessed.
.. testcode::
ac = AgentConnectorDataType(
env_id="env_1",
agent_id="agent_1",
data=np.array([1, 2, 3]),
)
etc.
.. testcode:: | ||
from ray.rllib.connectors.action.lambdas import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need newline here
# We use PPO and torch as an example here because many of the showcased | ||
# components need implementations to come together. However, the same | ||
# pattern is generally applicable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be more readable if you move it out of the testcode
as plaintext
# remove a module | ||
learner.remove_module("new_player") | ||
# Take one gradient update on the module and report the results | ||
# results = learner.update(...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Here and below) What would happen if we uncommented this and replaced the ellipses with an actual argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to construct a multi-agent training batch here.
This would be a nested dict with quite a lot of KV pairs with all values being torch tensors.
It's just too bug of a thing to construct here. We need to be able to generate example batches some time in the future but we are not there today.
|
||
model = MyModel() | ||
model.forward({"obs": torch.randn(32, 64)}) # No error | ||
model.forward({"obs": torch.randn(32, 32)}) # raises ValueError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause CI to fail. You could put this in a subsequent testcode that's explicitly skipped
model.forward({"obs": torch.randn(32, 64)}) # No error
... and this example raises a ValueError
.. testcode::
:skipif: True
model.forward({"obs": torch.randn(32, 32)})
.. testoutput::
Traceback:
...
|
||
.. code-block:: python | ||
# Example for creating a sampling loop: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is long. For better readability, it might be better to split this into separate testcodes (one for sampling, training, inference, etc)
@@ -13,10 +13,24 @@ class Distribution(abc.ABC): | |||
"""The base class for distribution over a random variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file explicitly skipped in the BUILD
file. Could we remove it from the list?
>>> action_logits = model.forward(obs) | ||
>>> action_dist = Distribution(action_logits) | ||
|
||
.. doctest:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This directive is unecessary
.. doctest:: | ||
|
||
>>> import numpy as np | ||
>>> from ray.rllib.models.distributions import Distribution | ||
|
||
>>> class Uniform(Distribution): | ||
... def __init__(self, lower, upper): | ||
... self.lower = lower | ||
... self.upper = upper | ||
... | ||
... def sample(self): | ||
... return self.lower + (self.upper - self.lower) * np.random.rand() | ||
... | ||
... def logp(self, x): | ||
... ... | ||
... | ||
... def kl(self, other): | ||
... ... | ||
... | ||
... def entropy(self): | ||
... ... | ||
... | ||
... @staticmethod | ||
... def required_input_dim(space): | ||
... ... | ||
... | ||
... def rsample(self): | ||
... ... | ||
... | ||
... @classmethod | ||
... def from_logits(cls, logits, **kwargs): | ||
... return Uniform(logits[:, 0], logits[:, 1]) | ||
|
||
>>> logits = np.array([[0.0, 1.0], [2.0, 3.0]]) | ||
>>> my_dist = Uniform.from_logits(logits) | ||
>>> sample = my_dist.sample() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is longer, so it'd be more readable as testcode
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com> Signed-off-by: NripeshN <nn2012@hw.ac.uk>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com> Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com> Signed-off-by: Victor <vctr.y.m@example.com>
Why are these changes needed?
As part of our migration to unify code snippets and doc test style (https://docs.ray.io/en/master/ray-contribute/writing-code-snippets.html#how-to-handle-hard-to-test-examples), this PR migrates our code examples in RLlib.