-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Asymmetric self-play #3653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asymmetric self-play #3653
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not able to review the whole thing but I left some initial comments.
@@ -38,11 +46,12 @@ def __init__( | |||
) | |||
|
|||
self.trainer = trainer | |||
self.controller = controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a cyclic reference between the controller and the ghost trainer. Not sure if we care about those in Python...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that this isn't ideal but from investigating it doesn't seem to have caused any issues (yet).
Co-Authored-By: Vincent-Pierre BERGES <vincentpierre@unity3d.com>
@@ -113,19 +112,7 @@ def write_stats( | |||
) | |||
if self.self_play and "Self-play/ELO" in values: | |||
elo_stats = values["Self-play/ELO"] | |||
mean_opponent_elo = values["Self-play/Mean Opponent ELO"] | |||
std_opponent_elo = values["Self-play/Std Opponent ELO"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still want this info somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how valuable it is honestly. I don't think we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor doc comments - but otherwise LGTM
|
||
In symmetric games, since all agents (even on opposing teams) will share the same policy, they should have the same 'Behavior Name' in their | ||
Behavior Parameters Script. In asymmetric games, they should have a different Behavior Name in their Behavior Parameters script. | ||
Note, in asymmetric games, the agents must have both different Behavior Names *and* different team IDs! Then, specify the trainer configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this means you can't have zerg?teamId=0
and protoss?teamId=0
? Is this a fundamental limitation? Sounds like something people are likely to get tripped up on.
If it's a removable restriction, don't let it block this PR, but can you log a jira for followup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be something we support when we introduce a true multiagent trainer i.e. multiple behavior names that are on the same team.
For more general information on training with ML-Agents, see [Training ML-Agents](Training-ML-Agents.md). | ||
For more algorithm specific instruction, please see the documentation for [PPO](Training-PPO.md) or [SAC](Training-SAC.md). | ||
|
||
Self-play is triggered by including the self-play hyperparameter hierarchy in the trainer configuration file. Detailed description of the self-play hyperparameters are contained below. Furthermore, to distinguish opposing agents, set the team ID to different integer values in the behavior parameters script on the agent prefab. | ||
|
||
 | ||
|
||
See the trainer configuration and agent prefabs for our Tennis environment for an example. | ||
***Team ID must be 0 or an integer greater than 0.*** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I know you said this was due to mypy and I never followed up with you on it. Similar to my other comment, if this was just done to make mypy happy, we can always get around that. Let's follow up on it afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, mypy wouldn't let me initialize the learning team ID int to None in the GhostController so I used -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd probably have to change the type to Optional[int]
and handle the None case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Proposed change(s)
Extends the toolkit to support ghost training of asymmetric games. Under our abstractions, this case would require multiple ghost trainer with higher level coordination. As a solution I've implemented something that's essentially a mutex. All ghost trainers query the GhostController for the team that is learning and this value determines how the ghost trainer treats trajectories/policies. This also makes much more use of the behavior identifiers which is a step toward supporting multi-agent.
TODO:
add tests for the asymmetric caseadd a real example environment to showcase this. i'll do that on a separate branch though to keep this small.StrikerVsGoalie and other self-play env improvements #3699update docsUseful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)
Types of change(s)
Checklist
Other comments