Skip to content

[RLlib] Mutiagent learning: can't combine replay lockstep and multiple agents controlled by the same policy. #9295

Closed
@raphaelavalos

Description

Hello,

While implementing MADDPG for PyTorch I noticed that it is not possible to combine the lockstep replay mode configuration and having multiple agents controlled by the same policy. This is due to the implementation of MultiAgentBatch as a dict of PolicyID -> SampleBatch. In the contrib TensorFlow implementation this issue was circumvented using parameter sharing. Another solution which is framework agnostic would be to group the agents.

I think that this combination should be supported without needing to group the agents. However, I suspect that it would require significant code change. Maybe another solution exists?

Also, the documentation should be updated alongside with the code that checks that the config is valid.

Thanks
(Amazing work btw)

Metadata

Assignees

No one assigned

    Labels

    questionJust a question :)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions