Add Trajectory and Policy Queues #3113

ervteng · 2019-12-21T01:10:15Z

This PR adds Trajectory and Policy Queues between the env_manager, AgentProcessor, and Trainers. Getting one step closer to decoupling the environment stepping and the trainer training.

However there are still two points of coupling that remain:

Summary and model writing. When the summaries are written is driven by the TrainerController currently - the TC tells the Trainers when to write out to Tensorboard. It would be better to have the trainer write out when appropriate during the trainer.advance() call -> Addressed in PR: Move stepping logic into advance() function #3124
Curriculum. Somehow, the TC still needs to know what the individual trainers' rewards are to update reset parameters in the environment. I think we might be able to get these stats from the AgentProcessor instead. - Postponed to future PR

chriselion · 2019-12-30T18:14:10Z

ml-agents/mlagents/trainers/agent_processor.py

@@ -134,8 +133,8 @@ def add_experiences(
                        agent_id=agent_id,
                        next_obs=next_obs,
                    )
-                    # This will eventually be replaced with a queue
-                    self.trainer.process_trajectory(trajectory)
+                    for _traj_queue in self.trajectory_queues:


nit: don't use leading underscores for local variables.

chriselion · 2019-12-30T18:18:25Z

ml-agents/mlagents/trainers/trainer_controller.py

+            trajectory_queue=Queue(),
+            policy_queue=Queue(),
+        )
+        agent_manager.processor.publish_trajectory_queue(agent_manager.trajectory_queue)


This feels strange that the AgentManager doesn't subscribe to its own trajectory_queue. I'd recommend making AgentManager its own class and set this up in AgentManager.__init__ (you can maybe do this as a namedtuple still, but I think you have to do it in __new__ instead)

I made AgentManager a subclass of AgentProcessor, that contains 2 queues. Didn't integrate it with AgentProcessor since in the future we might have processors that subscribe to multiple queues.

chriselion · 2019-12-30T18:22:08Z

ml-agents/mlagents/trainers/trainer.py

@@ -5,13 +5,15 @@
 from mlagents.tf_utils import tf

 from collections import deque
+from queue import Queue


Why do we need queue.Queue here instead of something lighter-weight like collections.deque? I don't think we need the synchronization that Queue provides (https://docs.python.org/3/library/queue.html)

Not yet :) The intention was to enable the environment stepping and trainer advancing to run in separate threads (or even processes) for algorithms like APE-X. Eventually we'd want the trainer to wait until a Trajectory is available before doing any computation. I went with Queue since the method calls are the same between Queue and multiprocessing.Queue, but are different from deque.

We could also just use deque until the time comes to change over.

Sounds good.

It appears that deque is actually thread-safe as long as you're using just get and put, so I'll switch to that. Even if we go multi-thread we can still use deque, and it's faster.

https://stackoverflow.com/questions/717148/queue-queue-vs-collections-deque

Should anything still be using queue.Queue()? If not, you should remove the imports and see what's breaking.

I think some of the tests still do - will switch them out

* Move writing logic out of TC * Modify trainer config files * Fix tests * Update Migrating doc * Add should_still_train * Fix write_summary * Move logic for writing summaries * Fix summary after loading from checkpoint * Switch Trainer to abstract class

ervteng · 2020-01-03T01:10:25Z

ml-agents/mlagents/trainers/agent_processor.py

+        separately from an AgentManager.
+        """
+        self.queue: Deque[Union[Trajectory, Policy]] = deque()
+        self.behavior_id = behavior_id


Giving the queue a behavior_id is understandably a bit weird. I did it this way so that the Trainer can retrieve which behavior id the policy queue belongs to and publish the right policy to it.

The alternative is to give the Trainer access to the AgentManager itself, but I wanted to keep the AgentManager abstraction separate from the Trainer. I'd imagine in the future, we'd want to assemble queues and Trainers in a different way.

chriselion · 2020-01-03T16:56:13Z

docs/Migrating.md

@@ -19,12 +19,14 @@ The versions can be found in
 * Offline Behavioral Cloning has been removed. To learn from demonstrations, use the GAIL and
 Behavioral Cloning features with either PPO or SAC. See [Imitation Learning](Training-Imitation-Learning.md) for more information.
 * `mlagents.envs` was renamed to `mlagents_envs`. The previous repo layout depended on [PEP420](https://www.python.org/dev/peps/pep-0420/), which caused problems with some of our tooling such as mypy and pylint.
+* Trainer steps are now counted per-Agent, not per-environment as in previous versions. For instance, if you have 10 Agents in the scene, 20 environment steps now corresponds to 200 steps as printed in the terminal and in Tensorboard.


Unless you're going to merge this into the release branch, we should make a "Migrating from 0.13 to latest" section and put these there.

chriselion · 2020-01-03T17:19:03Z

ml-agents/mlagents/trainers/trainer.py

+        Adds a trajectory queue to the list of queues for the trainer injest Trajectories from.
+        :param queue: Trajectory queue to publish to.
+        """
+        self.trajectory_queues.append(trajectory_queue)


Wrong types here - without #3151 mypy doesn't understand what an AgentManagerQueue is. With it, you'll get an error like

Argument 1 to "append" of "list" has incompatible type "Queue[Any]"; expected "AgentManagerQueue"

chriselion · 2020-01-03T17:24:35Z

ml-agents/mlagents/trainers/agent_processor.py

+    """
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)


I'd prefer to be explicit about the arguments here, but if you think it's too tedious, ignore me.

Explicit it is

chriselion

Blocking until type checks are fixed.

chriselion · 2020-01-03T17:41:16Z

ml-agents/mlagents/trainers/trainer.py

+        """
+        with hierarchical_timer("process_trajectory"):
+            for traj_queue in self.trajectory_queues:
+                if not traj_queue.empty():


I think if there are ever multiple consumers here, you could run into problems - if there's one element in the queue, both could think it's non-empty, and one would try to grab it and the other would fail and raise an exception. Might be better to just to a try/except around the get_nowait() call, and avoid the not empty() check.

Added an AgentManagerQueue.Empty exception and now using try/except here and in TC

… develop-queue

chriselion

Looks good, thanks for making those changes

This reverts commit 0514f95.

Ervin Teng added 9 commits December 20, 2019 15:56

Add advance method and policy queue

baf86b2

Merge branch 'master' into develop-queue

bbcb5a4

Add trajectory queue

884b65f

Remove trainer from agent_processor

aae95fb

Fix tests

5f0a353

Add timer for process trajectory

237c55e

Remove trainer from constructor

b2acd97

Fix remaining tests

4244d53

Move some logic to helper function

a3e0284

ervteng requested review from chriselion and andrewcoh December 24, 2019 01:03

chriselion reviewed Dec 30, 2019

View reviewed changes

Ervin T added 4 commits January 2, 2020 14:19

Make AgentManager separate class

fd868bf

Merge branch 'master' into develop-queue

3b70207

Fix tests

c0a3c3c

ervteng changed the title ~~WIP: Add Trajectory and Policy Queues~~ Add Trajectory and Policy Queues Jan 3, 2020

ervteng commented Jan 3, 2020

View reviewed changes

Ervin Teng added 4 commits January 2, 2020 17:12

Fix more tests

95af49c

Clean up Trainer interface

c2f1ae3

Fix RLTrainer Test

0523144

Additional cleanup

8725c59

chriselion reviewed Jan 3, 2020

View reviewed changes

chriselion suggested changes Jan 3, 2020

View reviewed changes

chriselion reviewed Jan 3, 2020

View reviewed changes

queue type cleanup (#3152)

9272df6

Ervin Teng added 7 commits January 3, 2020 10:09

Fix some typing and make constructor explicit

f1fccef

Merge branch 'master' of github.com:Unity-Technologies/ml-agents into…

8c6378e

… develop-queue

Clean up get_max_steps and remove dead method

14a0336

Update migrating doc

7126882

Add queue exception and fix issue with get_max_steps

751968c

Disable Pylint for RLTrainer

2e912bd

Fix AgentManagerQueue in tests

ffd9299

ervteng requested a review from chriselion January 3, 2020 19:20

chriselion approved these changes Jan 3, 2020

View reviewed changes

Ervin Teng added 5 commits January 3, 2020 12:30

Merge branch 'master' into develop-queue

d13dacd

Fix stats reporting for curriculum lesson

0514f95

Revert "Fix stats reporting for curriculum lesson"

b15492d

This reverts commit 0514f95.

Fix lesson number reporting

224bc0f

Ignore mismatched brain_names in curricula

5889e7e

ervteng merged commit 81310cf into master Jan 3, 2020

delete-merged-branch bot deleted the develop-queue branch January 3, 2020 23:12

github-actions bot locked as resolved and limited conversation to collaborators May 17, 2021

Add Trajectory and Policy Queues #3113

Add Trajectory and Policy Queues #3113

Uh oh!

Conversation

ervteng commented Dec 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ervteng Jan 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chriselion left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chriselion left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ervteng commented Dec 21, 2019 •

edited

Loading

ervteng Jan 2, 2020 •

edited

Loading