Asymmetric self-play #3653

andrewcoh · 2020-03-19T01:00:01Z

Proposed change(s)

Extends the toolkit to support ghost training of asymmetric games. Under our abstractions, this case would require multiple ghost trainer with higher level coordination. As a solution I've implemented something that's essentially a mutex. All ghost trainers query the GhostController for the team that is learning and this value determines how the ghost trainer treats trajectories/policies. This also makes much more use of the behavior identifiers which is a step toward supporting multi-agent.

TODO:

~~add tests for the asymmetric case~~
~~add a real example environment to showcase this. i'll do that on a separate branch though to keep this small.~~ StrikerVsGoalie and other self-play env improvements #3699
~~update docs~~

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

vincentpierre

I was not able to review the whole thing but I left some initial comments.

docs/Training-Self-Play.md

ml-agents/mlagents/trainers/behavior_id_utils.py

ml-agents/mlagents/trainers/ghost/controller.py

vincentpierre · 2020-03-19T18:01:33Z

ml-agents/mlagents/trainers/ghost/trainer.py

@@ -38,11 +46,12 @@ def __init__(
        )

        self.trainer = trainer
+        self.controller = controller


There is a cyclic reference between the controller and the ghost trainer. Not sure if we care about those in Python...

I know that this isn't ideal but from investigating it doesn't seem to have caused any issues (yet).

ml-agents/mlagents/trainers/ghost/trainer.py

ml-agents/mlagents/trainers/learn.py

ml-agents/mlagents/trainers/trainer_util.py

ml-agents/mlagents/trainers/behavior_id_utils.py

Co-Authored-By: Vincent-Pierre BERGES <vincentpierre@unity3d.com>

…ent_best

docs/Training-Self-Play.md

ml-agents/mlagents/trainers/behavior_id_utils.py

ml-agents/mlagents/trainers/ghost/controller.py

ervteng · 2020-03-27T16:54:46Z

ml-agents/mlagents/trainers/stats.py

@@ -113,19 +112,7 @@ def write_stats(
            )
            if self.self_play and "Self-play/ELO" in values:
                elo_stats = values["Self-play/ELO"]
-                mean_opponent_elo = values["Self-play/Mean Opponent ELO"]
-                std_opponent_elo = values["Self-play/Std Opponent ELO"]


Do we still want this info somewhere?

I'm not sure how valuable it is honestly. I don't think we need it.

ervteng

Some minor doc comments - but otherwise LGTM

docs/Training-Self-Play.md

chriselion · 2020-04-01T03:28:56Z

docs/Training-Self-Play.md

+
+In symmetric games, since all agents (even on opposing teams) will share the same policy, they should have the same 'Behavior Name' in their
+Behavior Parameters Script.  In asymmetric games, they should have a different Behavior Name in their Behavior Parameters script.
+Note, in asymmetric games, the agents must have both different Behavior Names *and* different team IDs! Then, specify the trainer configuration


So this means you can't have zerg?teamId=0 and protoss?teamId=0? Is this a fundamental limitation? Sounds like something people are likely to get tripped up on.

If it's a removable restriction, don't let it block this PR, but can you log a jira for followup?

This will be something we support when we introduce a true multiagent trainer i.e. multiple behavior names that are on the same team.

chriselion · 2020-04-01T03:30:10Z

docs/Training-Self-Play.md

 For more general information on training with ML-Agents, see [Training ML-Agents](Training-ML-Agents.md).
 For more algorithm specific instruction, please see the documentation for [PPO](Training-PPO.md) or [SAC](Training-SAC.md).

 Self-play is triggered by including the self-play hyperparameter hierarchy in the trainer configuration file.  Detailed description of the self-play hyperparameters are contained below. Furthermore, to distinguish opposing agents, set the team ID to different integer values in the behavior parameters script on the agent prefab.

 ![Team ID](images/team_id.png)

-See the trainer configuration and agent prefabs for our Tennis environment for an example.
+***Team ID must be 0 or an integer greater than 0.***


Sorry, I know you said this was due to mypy and I never followed up with you on it. Similar to my other comment, if this was just done to make mypy happy, we can always get around that. Let's follow up on it afterwards.

Basically, mypy wouldn't let me initialize the learning team ID int to None in the GhostController so I used -1.

You'd probably have to change the type to Optional[int] and handle the None case.

ml-agents/mlagents/trainers/behavior_id_utils.py

ml-agents/mlagents/trainers/ghost/controller.py

config/trainer_config.yaml

vincentpierre

LGTM

ml-agents/mlagents/trainers/ghost/trainer.py

andrewcoh added 7 commits February 29, 2020 10:15

ghost controller

e19b038

Merge branch 'master' into self-play-mutex

3335cc8

Merge branch 'master' into self-play-mutex

49f5cf4

team id centric ghost trainer

33ff2ff

ELO calculation done in ghost controller

3f69db7

removed opponent elo from stat collection

e19f9e5

passing all tests locally

4e1e139

andrewcoh requested review from ervteng, chriselion, awjuliani and vincentpierre March 19, 2020 01:01

andrewcoh added 3 commits March 18, 2020 18:31

fixed controller behavior when first team discovered isnt 0

1741c54

no negative team id in docs

cc17ea1

save step on trainer step count/swap on ghost

43417e1

vincentpierre reviewed Mar 19, 2020

View reviewed changes

chriselion reviewed Mar 19, 2020

View reviewed changes

ml-agents/mlagents/trainers/behavior_id_utils.py Outdated Show resolved Hide resolved

andrewcoh and others added 13 commits March 19, 2020 15:43

urllib parse

124f886

Update docs/Training-Self-Play.md

8778cec

Co-Authored-By: Vincent-Pierre BERGES <vincentpierre@unity3d.com>

remove whitespace

33c5ea9

Merge branch 'master' into self-play-mutex

c2eea64

docstrings/ghost_swap -> team_change

bd86108

replaced ghost_swap with team_change in tests

82bdfc4

docstrings for all ghost trainer functions

cb855db

SELF-PLAY NOW SUPPORTS MULTIAGENT TRAINERS

fb5ccd0

next learning team from get step

c3890f5

comment for self.ghost_step

cad0a2d

fixed export so both teams have current model

f68f7aa

updated self-play doc for asymmetric games/changed current_self->curr…

4c9ba86

…ent_best

count trainer steps in controller by team id

ffe2cfd

ervteng reviewed Mar 23, 2020

View reviewed changes

docs/Training-Self-Play.md Outdated Show resolved Hide resolved

andrewcoh added 4 commits March 25, 2020 15:09

renamed controller methods/doc fixes

2cb5a2d

current_best_ratio -> latest_model_ratio

27e924e

added Foerster paper title to doc

f3332c3

doc fix

aca54be

andrewcoh changed the title ~~[WIP] Asymmetric self-play~~ Asymmetric self-play Mar 26, 2020

Merge branch 'master' into self-play-mutex

0e52b20

ervteng reviewed Mar 27, 2020

View reviewed changes

ml-agents/mlagents/trainers/behavior_id_utils.py Outdated Show resolved Hide resolved

ervteng reviewed Mar 27, 2020

View reviewed changes

ml-agents/mlagents/trainers/ghost/controller.py Show resolved Hide resolved

ervteng reviewed Mar 27, 2020

View reviewed changes

andrewcoh added 4 commits March 27, 2020 10:59

doc fix

95469d2

Merge branch 'master' into self-play-mutex

10bd9dd

Merge branch 'master' into self-play-mutex

01f9de3

using mlagents_env.logging instead of logging

61649ea

ervteng approved these changes Mar 30, 2020

View reviewed changes

docs/Training-Self-Play.md Outdated Show resolved Hide resolved

andrewcoh added 4 commits March 31, 2020 11:05

doc fix

972ed63

modified doc to not include strikers vs goalie

7e0a3ba

removed "unpredictable behavior"

6c5342d

Merge branch 'master' into self-play-mutex

9149413

chriselion reviewed Apr 1, 2020

View reviewed changes

ml-agents/mlagents/trainers/behavior_id_utils.py Outdated Show resolved Hide resolved

chriselion reviewed Apr 1, 2020

View reviewed changes

ml-agents/mlagents/trainers/ghost/controller.py Outdated Show resolved Hide resolved

chriselion reviewed Apr 1, 2020

View reviewed changes

config/trainer_config.yaml Show resolved Hide resolved

vincentpierre approved these changes Apr 1, 2020

View reviewed changes

ml-agents/mlagents/trainers/ghost/trainer.py Show resolved Hide resolved

andrewcoh added 3 commits April 1, 2020 12:05

added to mig doc/address comments

02455a4

raise warning when latest_model_ratio not btwn 0, 1

df8b87f

removed Goalie from learning environment examples

1333fb9

andrewcoh merged commit 3294d9e into master Apr 1, 2020

delete-merged-branch bot deleted the self-play-mutex branch April 1, 2020 22:19

github-actions bot locked as resolved and limited conversation to collaborators May 15, 2021

Asymmetric self-play #3653

Asymmetric self-play #3653

Uh oh!

Conversation

andrewcoh commented Mar 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed change(s)

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Other comments

Uh oh!

vincentpierre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ervteng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vincentpierre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

andrewcoh commented Mar 19, 2020 •

edited

Loading