Remove Offline BC Training #2969

ervteng · 2019-11-26T00:34:36Z

The functionality of the offline trainer is duplicated in the pretraining module; furthermore, GAIL usually produces better results. This PR:

Removes the independent offline BC trainer
Renames pretraining to "Behavioral Cloning" (probably the most controversial thing here)
changes the docs appropriately. We now recommend using GAIL + PreTraining (Now BC) with PPO to learn from demos.

chriselion · 2019-11-26T01:48:31Z

ml-agents/mlagents/trainers/components/bc/module.py

@@ -22,7 +22,7 @@ def __init__(
        samples_per_update: int = 0,
    ):
        """
-        A BC trainer that can be used inline with RL, especially for pretraining.
+        A BC trainer that can be used inline with RL, especially for behavioral_cloning.


Was this a find-and-replace? sounds awkward here.

Removed this, as it no longer makes sense

chriselion · 2019-11-26T01:51:24Z

ml-agents/mlagents/trainers/trainer_util.py

@@ -97,11 +96,7 @@ def initialize_trainer(
        trainer_parameters.update(trainer_config[_brain_key])

    trainer = None
-    if trainer_parameters["trainer"] == "offline_bc":


Can you leave this in and raise an error message with upgrade instructions, at least for one release?

Added exception message

nit: I'd say "removed", not "deprecated". To me, deprecated means "going to be removed in the future, but still around for now"

changed to removed

docs/Training-Behavioral-Cloning.md

chriselion · 2019-11-26T01:53:02Z

ml-agents/mlagents/trainers/tests/test_bc.py

@@ -1,227 +0,0 @@
-import unittest.mock as mock


Can you just make sure that coverage in non-deleted files didn't go down? I can look at the coverage reports with you tomorrow.

Sounds good - from my run it's hovering around 82%.

More precise numbers - with current develop, we have 82.6% and with this branch it's at 82.4%.

Seems to be a 1-line drop on bc_module's model.py file

Sorry, forgot to follow up on this. That seems like a reasonable change.

Still investigating the 1-line, since model.py wasn't changed but pytest claims there's 31 (?) lines in the file vs. 30 before.
edit seems to be a cov glitch and not a real issue. All the lines show up as covered in the report.

chriselion · 2019-11-26T02:06:12Z

docs/Reward-Signals.md

@@ -135,7 +135,7 @@ discriminator is trained to better distinguish between demonstrations and agent
 In this way, while the agent gets better and better at mimicing the demonstrations, the
 discriminator keeps getting stricter and stricter and the agent must try harder to "fool" it.

-This approach, when compared to [Behavioral Cloning](Training-Behavioral-Cloning.md), requires
+This approach, when compared to Behavioral Cloning, requires


Maybe link to the paper?

Or maybe just shorten the paragraph to "This approach is especially effective when combined with an Extrinsic signal..."

You're right, this paragraph was pretty awful... I revised it a bit

chriselion · 2019-11-26T02:10:27Z

docs/Training-Imitation-Learning.md

@@ -19,40 +19,39 @@ imitation learning combined with reinforcement learning can dramatically
 reduce the time the agent takes to solve the environment.
 For instance, on the [Pyramids environment](Learning-Environment-Examples.md#pyramids),
 using 6 episodes of demonstrations can reduce training steps by more than 4 times.
-See PreTraining + GAIL + Curiosity + RL below.
+See Behavioral Cloning + GAIL + Curiosity + RL below.


Note that you'll need to change the legend in the linked image.

chriselion · 2019-11-26T02:12:59Z

I'm ambivalent to the pretraining->BC rename. Deferring to folks who are more familiar with it.

robinerd · 2019-11-26T15:14:11Z

Interesting, I am right now comparing gail and offline_bc in a project where I teach an agent to play a game based on visual inputs, to imitate a human playstyle.

My first impression so far was that I got better results with BC than gail.
If there's any way I can try out these changes I would be happy to do a new comparison and let you know how it works for my case, since I'm anyway going to spend some time optimizing this agent.

Here's a video about my project: https://www.linkedin.com/posts/robin-lindh-nilsson-7a30205b_ai-machinelearning-deeplearning-activity-6604987123860815872-rI1K

ervteng · 2019-11-26T18:11:54Z

@robinerd thanks for chiming in! If there are indeed cases where BC beats GAIL clearly, it's a strong signal for us that we shouldn't remove BC. It's possible that GAIL is not as good with images as BC is.

You can already try these features! The current setup lets you combine BC and GAIL with PPO, only that BC is called "pretraining". You can set steps to 0 to have pretraining last continuously throughout your run. In fact, if you set the strengths of all the reward_signals to 0 and pretraining strength to 1.0, it should effectively become BC.

Another thing I'd play with is setting use_actions to true for GAIL. This should better force-mimic your playthroughs. It's less useful when combining GAIL with an RL signal, but for pure demonstrations it should work better.

Thanks again for giving these features a real shakedown - always looking for such feedback when deciding what to remove and what to add.

robinerd · 2019-11-26T19:01:40Z

@ervteng Thanks for your reply! I'm glad to try it out, you're all doing a great job with this :)
Right now I'm using only a gail reward signal with 1.0 strength, and I can't seem to get it to converge.

All actions are randomly fluctuating a lot while the same situation works fairly OK with offline_bc. Have experimented with use_actions and use_vail, and some different encoding sizes.

I will start a discussion issue (will put link here) with more info, maybe we can understand it better together. Maybe it doesn't work well with cameras, or maybe I'm just not using it right.

robinerd · 2019-11-26T19:12:37Z

Before creating any issue I will check if I can get good results uby using only the pretraining reward as you suggested.

I didn't realize that pretraining was essentially using behavioral cloning. I thought it was based on GAIL in some way too. And I was confused what the difference between pretraining and GAIL was. Then it all makes sense now 😄

Is it still interesting to dig into the reason why I don't get good results with GAIL then?

ervteng · 2019-11-26T19:41:23Z

Hey @robinerd, this discussion is super helpful, let's still create the issue. Would like to make sure we don't lose any functionality if we remove offline BC, esp. with images.

One of the reasons for this PR is to get rid of that ambiguity of having several of the same technique in different places (and rename pretraining to BC).

A side note: I just fixed an issue with setting steps to 0 in pretraining, for now just set it to some large number to get it to run.

vincentpierre · 2019-12-02T20:07:05Z

docs/Training-PPO.md

 "seen" rewards, and allow it to optimize past the available demonstrations and/or generalize
 outside of the provided demonstrations. `steps` corresponds to the training steps over which
-pretraining is active. The learning rate of the pretrainer will anneal over the steps. Set
+BC is active. The learning rate of the cloning will anneal over the steps. Set


The learning rate of the ~~cloning~~ behavioral cloning will anneal over the steps.

Changed to the abbreviation BC

vincentpierre · 2019-12-02T20:08:29Z

docs/Training-SAC.md

        demo_path: ./demos/ExpertPyramid.demo
        strength: 0.5
        steps: 10000
 ```

-Below are the available hyperparameters for pretraining.
+Below are the available hyperparameters for BC.


Should Behavioral Cloning be abbreviated? I think it would be better to keep it consistent in the docs. What do you think?

I abbreviated it since we abbreviate PPO, SAC, and GAIL. I think the 1st mention on any particular page should be full with the abbreviation in parenthesis, then abbreviated - what do you think?

Looks good.

robinerd · 2019-12-02T20:48:39Z

Hi, I didn't get the time to make an issue previously but I was able to get similar results with pretraining instead of offline_bc. I set the extrinsic reward strength to 0 and added my demo files in pretraining. So from the point of view of my bot with visual observations I don't see any issue of removing the offline_bc.

Regarding GAIL, I wasn't able to get it to converge. I just got seemingly random action all the time. Maybe it doesn't work good with visual observations, not sure.

Anyway, I can stick to the pretraining, and furthermore I think it's good you're renaming it to behavioral cloning. I was for a long time a bit confused about was pretraining was actually doing.

ervteng · 2019-12-02T22:41:49Z

Hi @robinerd, thank you again for giving the pretraining a go and reporting your results! Have you tried combining GAIL and PreTraining to see if it will train?

There has been evidence in the literature that suggests GAIL works for visual obs better when combined with a (reward signal. Especially in your case, it's possible that the discriminator's neural network just isn't big enough to capture what "similar" means between two screens of the game. It's possible that the pretraining will give it enough of a direction to learn its way there.

…removebc

robinerd · 2019-12-03T07:15:26Z

Thanks for the suggestion @ervteng, I haven't tried that. Sounds very interesting! I will probably give that a go :)

vincentpierre · 2019-12-12T01:20:57Z

docs/Training-SAC.md

        demo_path: ./demos/ExpertPyramid.demo
        strength: 0.5
        steps: 10000
 ```

-Below are the available hyperparameters for pretraining.
+Below are the available hyperparameters for BC.


Looks good.

Ervin Teng added 6 commits November 25, 2019 15:50

Change default num epochs on PPO

48d871f

Rename pretraining to behavioral_cloning

3d18773

Remove BC trainer

cedd0a0

Remove offline BC from docs

94fd18f

Add to migrating docs

3612e62

Fix tests

14f36a0

ervteng requested review from chriselion and vincentpierre November 26, 2019 00:34

Ervin Teng added 2 commits November 25, 2019 16:36

Fix doc link

ee8c1e6

Fix more doc links

c618c85

chriselion reviewed Nov 26, 2019

View reviewed changes

docs/Training-Behavioral-Cloning.md Show resolved Hide resolved

chriselion reviewed Nov 26, 2019

View reviewed changes

Ervin Teng added 3 commits November 25, 2019 22:16

Remove awk comments

29df9b6

Add a deprecated message

82c5ca2

Clean up docs

71228a1

vincentpierre reviewed Dec 2, 2019

View reviewed changes

Ervin Teng added 2 commits December 2, 2019 17:14

Merge commit '51a2f07b776db1373256fb26ef44d87ee700d620' into develop-…

2866892

…removebc

Fix BC module tests

7c672f1

Ervin Teng added 4 commits December 2, 2019 17:48

Change phrasing of error message

509f29f

Make abbreviation more consistent, clean up param table

1cf6777

Clean up language in BC doc

d5bb854

Change legend on RL plot

698d0cb

ervteng changed the title ~~[WIP] Remove Offline BC Training~~ Remove Offline BC Training Dec 3, 2019

ervteng marked this pull request as ready for review December 3, 2019 02:19

chriselion closed this Dec 5, 2019

chriselion reopened this Dec 6, 2019

ervteng changed the base branch from develop to master December 6, 2019 19:26

Merge branch 'master' into develop-removebc

522d8c5

vincentpierre self-requested a review December 12, 2019 01:19

vincentpierre approved these changes Dec 12, 2019

View reviewed changes

ervteng merged commit 299b332 into master Dec 12, 2019

delete-merged-branch bot deleted the develop-removebc branch December 12, 2019 01:45

github-actions bot locked as resolved and limited conversation to collaborators May 17, 2021

Remove Offline BC Training #2969

Remove Offline BC Training #2969

Uh oh!

Conversation

ervteng commented Nov 26, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ervteng Dec 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chriselion commented Nov 26, 2019

Uh oh!

robinerd commented Nov 26, 2019

Uh oh!

ervteng commented Nov 26, 2019

Uh oh!

robinerd commented Nov 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robinerd commented Nov 26, 2019

Uh oh!

ervteng commented Nov 26, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robinerd commented Dec 2, 2019

Uh oh!

ervteng commented Dec 2, 2019

Uh oh!

robinerd commented Dec 3, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ervteng Dec 3, 2019 •

edited

Loading

robinerd commented Nov 26, 2019 •

edited

Loading