[refactor] Run Trainers in separate threads #3690

ervteng · 2020-03-25T22:28:49Z

Proposed change(s)

This began as a fix to SAC's num_updates functionality. and evolved into a bit more than that.

Changes to SAC and why they were necessary

Currently SAC updates each time advance() is called, unless train_interval is specified, in which case it updates every train_interval steps. num_updates refers to the number of batches sampled and used per updates. So with these two parameters you can control how often updates happen relative to steps.

The problem becomes when you scale the number of areas or envs. Now all of a sudden we have a much greater number of steps per update, and the parameter doesn’t scale.

This PR introduces a steps_per_update parameter to SAC. SAC will keep track of the number of steps it receives when a trajectory comes in, and do the appropriate number of updates. HOWEVER, this produced a bad user experience, as the game would randomly "freeze" to do training.

Further changes

To get around this problem, this PR also puts the Trainers' advance() calls in a separate thread. We do this by making AgentManagerQueue a thread-safe blocking Queue, and semi-blocking (get with timeout) on the queues in advance()

This has the added benefit of (as measured on 3DBall, single environment):

15% step throughput improvement on PPO
13% step throughput improvement on SAC

Note that this slightly breaks the on-policyness of PPO; steps taken during a buffer update will use the in-between parameterizations (e.g. a step taken after the first batch will have that parameterization, after the 2nd batch will have the next, and so on). I've made threading disableable easily with a flag in TrainerController so we could turn it on/off if we see significant degradation in performance.

~~TODO: test Ghost Trainer, change documentation.~~ Test with larger number of envs.
~~Future: Switch to full multiprocess (requires a rethink of the StatsReporter shared between the trainer and the AgentProcessor).~~

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

docs/Training-SAC.md

andrewcoh · 2020-04-03T20:41:21Z

docs/Training-SAC.md


-Typical Range: `1`
+Typical Range: `10` - `20`


What's the reasoning behind this range? It's not atypical to perform an update on every step of an agent. Granted, that doesn't scale if there are ten agents and every single one takes a step per timestep in which case the minimum is 10. Maybe this should be written as a function of the number of agents/envs?

This was to keep in-line with our previous SAC behavior, which was one update per env step (~10 agent steps). From empirical trials I noticed a small sample efficiency improvement pushing this down to 1 - but huge wallclock time degradation as the updates take too long. This was even with Snoopy Pop, the slowest env we have.

This number should be independent of # of agents/envs unlike num_updates. Maybe we should lower the range to 1-20?

andrewcoh · 2020-04-03T20:50:00Z

How did you test the GhostTrainer?

ervteng · 2020-04-03T20:52:16Z

How did you test the GhostTrainer?

Ran it for half an hour and the ELO went up a bit. Am running a full cloud run of all envs now.

ml-agents/mlagents/trainers/sac/trainer.py

andrewcoh · 2020-04-03T21:01:04Z

ml-agents/mlagents/trainers/sac/trainer.py

        batch_update_stats: Dict[str, list] = defaultdict(list)
-        for _ in range(num_updates):
+        while self.step / self.update_steps > self.steps_per_update:


Is the new hyperparameter enforced as "every time a steps_per_update num of steps elapses, we perform an update" or "after x steps elapses, we perform x/steps_per_update updates." Judging from this line, is it the latter?

Aren't these things the same thing?

@andrewcoh @ervteng did you work this out?

@ervteng I don't think they are the same thing. If (1) is enforced then (2) is true, but not necessarily is the inverse true. Not sure it's a big deal.

ml-agents/mlagents/trainers/trainer_controller.py

harperj · 2020-04-13T18:51:52Z

ml-agents/mlagents/trainers/trainer/trainer.py

        """
        Advances the trainer. Typically, this means grabbing trajectories
        from all subscribed trajectory queues (self.trajectory_queues), and updating
        a policy using the steps in them, and if needed pushing a new policy onto the right
        policy queues (self.policy_queues).
+        :param empty_queue: Whether or not to empty the queue when called. For synchronous


Suggested change

:param empty_queue: Whether or not to empty the queue when called. For synchronous

:param empty_queue: Whether or not to empty the trajectory queue when called. For synchronous

Removed empty_queue and am using qsize to empty the queue every time.

harperj · 2020-04-13T19:17:58Z

ml-agents/mlagents/trainers/trainer/rl_trainer.py

@@ -142,16 +146,18 @@ def advance(self) -> None:
                # being emptied, the trajectories in the queue are on-policy.
                for _ in range(traj_queue.maxlen):


Could this be traj_queue.qsize?

👍 changed.

harperj · 2020-04-13T19:18:22Z

ml-agents/mlagents/trainers/trainer/rl_trainer.py

@@ -142,16 +146,18 @@ def advance(self) -> None:
                # being emptied, the trajectories in the queue are on-policy.
                for _ in range(traj_queue.maxlen):
                    try:
-                        t = traj_queue.get_nowait()
+                        t = traj_queue.get(block=not empty_queue, timeout=0.05)
                        self._process_trajectory(t)


should we still process the trajectory if we have exceeded our buffer size?

That's a possibility - I guess the question is whether it is better to dump trajectories or exceed the buffer size. I'm leaning towards exceeding the buffer size. @andrewcoh do you have any preferences?

Ultimately we should change this loop so that we stop getting from the queue if we need to update.

… develop-sac-apex

harperj

LGTM

andrewcoh · 2020-04-17T21:04:47Z

docs/Training-PPO.md

+### (Optional) Advanced: Disable Threading
+
+By default, PPO model updates can happen while the environment is being stepped. To disable this
+behavior, for instance to maintain strict


The point of this sentence could be a bit clearer. "To disable this, do this." Then, "One may want to do this to maintain the strict on-policyness of PPO..." etc.

It should maybe also be clearer that you're violating an assumption of PPO in exchange for training speed up. As it's written, it feels like there's only gain to enabling threading.

Updated the docs - hopefully the new one is clearer

docs/Training-SAC.md

Fix typo Co-Authored-By: andrewcoh <54679309+andrewcoh@users.noreply.github.com>

Co-Authored-By: andrewcoh <54679309+andrewcoh@users.noreply.github.com>

…gents into develop-sac-apex

andrewcoh

… develop-sac-apex

commit 3fed09d Author: Ervin T <ervin@unity3d.com> Date: Mon Apr 20 13:21:28 2020 -0700 [bug-fix] Increase buffer size for SAC tests (#3813) commit 99ed28e Author: Ervin T <ervin@unity3d.com> Date: Mon Apr 20 13:06:39 2020 -0700 [refactor] Run Trainers in separate threads (#3690) commit 52b7d2e Author: Chris Elion <chris.elion@unity3d.com> Date: Mon Apr 20 12:20:45 2020 -0700 update upm-ci-utils source (#3811) commit 89e4804 Author: Vincent-Pierre BERGES <vincentpierre@unity3d.com> Date: Mon Apr 20 12:06:59 2020 -0700 Removing done from the llapi doc (#3810)

Ervin Teng added 16 commits March 20, 2020 14:27

Use steps_per_update to determine SAC train interval

9a703dc

Update trainer config

7235b1d

Don't count buffer_init_steps

3abe198

Fix comment

bcea5ad

Fix tests

4b980d7

Make trainer in separate threads

3371be8

Fix comments

1ab2606

Fix TC test

87a97da

Merge branch 'master' into develop-sac-apex

c64de30

Update docs

cabef5f

Remove num_update as param

c3db5fe

Update changelog

0b29099

Ability to disable threading

80f46ec

Don't block when disabling threading

c84c849

Fix ghost trainer locking up

d13ec05

Fix subprocess test

a9e881b

andrewcoh reviewed Apr 3, 2020

View reviewed changes

docs/Training-SAC.md Outdated Show resolved Hide resolved

andrewcoh reviewed Apr 3, 2020

View reviewed changes

ervteng requested review from harperj and removed request for harperj April 3, 2020 20:51

andrewcoh reviewed Apr 3, 2020

View reviewed changes

ml-agents/mlagents/trainers/sac/trainer.py Outdated Show resolved Hide resolved

andrewcoh reviewed Apr 3, 2020

View reviewed changes

ervteng requested review from harperj and andrewcoh April 3, 2020 21:46

ervteng changed the title ~~[WIP] Put Trainers in separate threads~~ [refactor] Put Trainers in separate threads Apr 3, 2020

ervteng changed the title ~~[refactor] Put Trainers in separate threads~~ [refactor] Run Trainers in separate threads Apr 3, 2020

ervteng marked this pull request as ready for review April 3, 2020 21:47

Address comments in docs

e9f1540

harperj reviewed Apr 13, 2020

View reviewed changes

Ervin Teng added 15 commits April 13, 2020 13:34

Remove empty_queue interface

0c9c5e8

Revert to get_nowait method in AgentManagerQueue

ac412a1

Remove params from get_nowait

5bafadf

get_nowait in env_manager

e938ee3

Adjust walker params

b9d908c

Fix subprocess env manager test

ca2d184

Merge branch 'master' into develop-sac-apex

e5b4c04

Adjust Reacher steps_per_update

14b3bad

Increase PushBlock summary steps

51592ed

Make threading disable-able per trainer

8dd7c0f

Remove threaded from trainer_controller

1e44bdd

Merge branch 'master' of github.com:Unity-Technologies/ml-agents into…

e99d927

… develop-sac-apex

Fix default value in docs

ca7631e

Disable threading for all simple_rl tests

afe03ec

kill trainer threads when training finishes

71d30f0

harperj approved these changes Apr 16, 2020

View reviewed changes

andrewcoh reviewed Apr 17, 2020

View reviewed changes

docs/Training-SAC.md Outdated Show resolved Hide resolved

andrewcoh reviewed Apr 17, 2020

View reviewed changes

docs/Training-SAC.md Outdated Show resolved Hide resolved

Ervin T and others added 5 commits April 17, 2020 15:56

Update docs/Training-SAC.md

7e11a32

Fix typo Co-Authored-By: andrewcoh <54679309+andrewcoh@users.noreply.github.com>

Update docs/Training-SAC.md

3ada415

Co-Authored-By: andrewcoh <54679309+andrewcoh@users.noreply.github.com>

Update documentation about disabling threading

b068110

Merge branch 'develop-sac-apex' of github.com:Unity-Technologies/ml-a…

d8913be

…gents into develop-sac-apex

Update SAC documentation

1ce08a1

andrewcoh approved these changes Apr 20, 2020

View reviewed changes

Merge branch 'master' of github.com:Unity-Technologies/ml-agents into…

339fcf8

… develop-sac-apex

ervteng merged commit 99ed28e into master Apr 20, 2020

github-actions bot locked as resolved and limited conversation to collaborators May 15, 2021

	:param empty_queue: Whether or not to empty the queue when called. For synchronous
	:param empty_queue: Whether or not to empty the trajectory queue when called. For synchronous

		@@ -142,16 +146,18 @@ def advance(self) -> None:
		# being emptied, the trajectories in the queue are on-policy.
		for _ in range(traj_queue.maxlen):

[refactor] Run Trainers in separate threads #3690

[refactor] Run Trainers in separate threads #3690

Uh oh!

Conversation

ervteng commented Mar 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed change(s)

Changes to SAC and why they were necessary

Further changes

Types of change(s)

Checklist

Other comments

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ervteng Apr 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewcoh commented Apr 3, 2020

Uh oh!

ervteng commented Apr 3, 2020

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harperj left a comment

Choose a reason for hiding this comment

Uh oh!

andrewcoh Apr 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

andrewcoh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ervteng commented Mar 25, 2020 •

edited

Loading

ervteng Apr 3, 2020 •

edited

Loading

andrewcoh Apr 17, 2020 •

edited

Loading