Skip to content

Asymmetric self-play #3653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 48 commits into from
Apr 1, 2020
Merged

Asymmetric self-play #3653

merged 48 commits into from
Apr 1, 2020

Conversation

andrewcoh
Copy link
Contributor

@andrewcoh andrewcoh commented Mar 19, 2020

Proposed change(s)

Extends the toolkit to support ghost training of asymmetric games. Under our abstractions, this case would require multiple ghost trainer with higher level coordination. As a solution I've implemented something that's essentially a mutex. All ghost trainers query the GhostController for the team that is learning and this value determines how the ghost trainer treats trajectories/policies. This also makes much more use of the behavior identifiers which is a step toward supporting multi-agent.

TODO:

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

Copy link
Contributor

@vincentpierre vincentpierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not able to review the whole thing but I left some initial comments.

@@ -38,11 +46,12 @@ def __init__(
)

self.trainer = trainer
self.controller = controller
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a cyclic reference between the controller and the ghost trainer. Not sure if we care about those in Python...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this isn't ideal but from investigating it doesn't seem to have caused any issues (yet).

@andrewcoh andrewcoh changed the title [WIP] Asymmetric self-play Asymmetric self-play Mar 26, 2020
@@ -113,19 +112,7 @@ def write_stats(
)
if self.self_play and "Self-play/ELO" in values:
elo_stats = values["Self-play/ELO"]
mean_opponent_elo = values["Self-play/Mean Opponent ELO"]
std_opponent_elo = values["Self-play/Std Opponent ELO"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want this info somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how valuable it is honestly. I don't think we need it.

Copy link
Contributor

@ervteng ervteng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor doc comments - but otherwise LGTM


In symmetric games, since all agents (even on opposing teams) will share the same policy, they should have the same 'Behavior Name' in their
Behavior Parameters Script. In asymmetric games, they should have a different Behavior Name in their Behavior Parameters script.
Note, in asymmetric games, the agents must have both different Behavior Names *and* different team IDs! Then, specify the trainer configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this means you can't have zerg?teamId=0 and protoss?teamId=0? Is this a fundamental limitation? Sounds like something people are likely to get tripped up on.

If it's a removable restriction, don't let it block this PR, but can you log a jira for followup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be something we support when we introduce a true multiagent trainer i.e. multiple behavior names that are on the same team.

For more general information on training with ML-Agents, see [Training ML-Agents](Training-ML-Agents.md).
For more algorithm specific instruction, please see the documentation for [PPO](Training-PPO.md) or [SAC](Training-SAC.md).

Self-play is triggered by including the self-play hyperparameter hierarchy in the trainer configuration file. Detailed description of the self-play hyperparameters are contained below. Furthermore, to distinguish opposing agents, set the team ID to different integer values in the behavior parameters script on the agent prefab.

![Team ID](images/team_id.png)

See the trainer configuration and agent prefabs for our Tennis environment for an example.
***Team ID must be 0 or an integer greater than 0.***
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I know you said this was due to mypy and I never followed up with you on it. Similar to my other comment, if this was just done to make mypy happy, we can always get around that. Let's follow up on it afterwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, mypy wouldn't let me initialize the learning team ID int to None in the GhostController so I used -1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd probably have to change the type to Optional[int] and handle the None case.

Copy link
Contributor

@vincentpierre vincentpierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andrewcoh andrewcoh merged commit 3294d9e into master Apr 1, 2020
@delete-merged-branch delete-merged-branch bot deleted the self-play-mutex branch April 1, 2020 22:19
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants