You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While implementing MADDPG for PyTorch I noticed that it is not possible to combine the lockstep replay mode configuration and having multiple agents controlled by the same policy. This is due to the implementation of MultiAgentBatch as a dict of PolicyID -> SampleBatch. In the contrib TensorFlow implementation this issue was circumvented using parameter sharing. Another solution which is framework agnostic would be to group the agents.
I think that this combination should be supported without needing to group the agents. However, I suspect that it would require significant code change. Maybe another solution exists?
Also, the documentation should be updated alongside with the code that checks that the config is valid.