Open
Description
Motivation
It is not very clear what should be the structure of TensorDict of the return of the _step() function for a multi agent environment.
If there are two agents A and B and they are in separate groups, what would be the structure of the TensorDict that is returned by the _step() function?
Solution
Update the documentation and provide an example of a multi agent environment that is written natively in torchrl and it is not translated from other frameworks like petting zoo
Checklist
- I have checked that there is no similar issue in the repo (required)