Skip to content

Remove Offline BC Training #2969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Dec 12, 2019
Merged

Remove Offline BC Training #2969

merged 18 commits into from
Dec 12, 2019

Conversation

ervteng
Copy link
Contributor

@ervteng ervteng commented Nov 26, 2019

The functionality of the offline trainer is duplicated in the pretraining module; furthermore, GAIL usually produces better results. This PR:

  • Removes the independent offline BC trainer
  • Renames pretraining to "Behavioral Cloning" (probably the most controversial thing here)
  • changes the docs appropriately. We now recommend using GAIL + PreTraining (Now BC) with PPO to learn from demos.

@@ -22,7 +22,7 @@ def __init__(
samples_per_update: int = 0,
):
"""
A BC trainer that can be used inline with RL, especially for pretraining.
A BC trainer that can be used inline with RL, especially for behavioral_cloning.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a find-and-replace? sounds awkward here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this, as it no longer makes sense

@@ -97,11 +96,7 @@ def initialize_trainer(
trainer_parameters.update(trainer_config[_brain_key])

trainer = None
if trainer_parameters["trainer"] == "offline_bc":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave this in and raise an error message with upgrade instructions, at least for one release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added exception message

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd say "removed", not "deprecated". To me, deprecated means "going to be removed in the future, but still around for now"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to removed

@@ -1,227 +0,0 @@
import unittest.mock as mock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just make sure that coverage in non-deleted files didn't go down? I can look at the coverage reports with you tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - from my run it's hovering around 82%.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More precise numbers - with current develop, we have 82.6% and with this branch it's at 82.4%.

Seems to be a 1-line drop on bc_module's model.py file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot to follow up on this. That seems like a reasonable change.

Copy link
Contributor Author

@ervteng ervteng Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still investigating the 1-line, since model.py wasn't changed but pytest claims there's 31 (?) lines in the file vs. 30 before.
edit seems to be a cov glitch and not a real issue. All the lines show up as covered in the report.

@@ -135,7 +135,7 @@ discriminator is trained to better distinguish between demonstrations and agent
In this way, while the agent gets better and better at mimicing the demonstrations, the
discriminator keeps getting stricter and stricter and the agent must try harder to "fool" it.

This approach, when compared to [Behavioral Cloning](Training-Behavioral-Cloning.md), requires
This approach, when compared to Behavioral Cloning, requires
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link to the paper?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe just shorten the paragraph to "This approach is especially effective when combined with an Extrinsic signal..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this paragraph was pretty awful... I revised it a bit

@@ -19,40 +19,39 @@ imitation learning combined with reinforcement learning can dramatically
reduce the time the agent takes to solve the environment.
For instance, on the [Pyramids environment](Learning-Environment-Examples.md#pyramids),
using 6 episodes of demonstrations can reduce training steps by more than 4 times.
See PreTraining + GAIL + Curiosity + RL below.
See Behavioral Cloning + GAIL + Curiosity + RL below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you'll need to change the legend in the linked image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@chriselion
Copy link
Contributor

I'm ambivalent to the pretraining->BC rename. Deferring to folks who are more familiar with it.

@robinerd
Copy link

Interesting, I am right now comparing gail and offline_bc in a project where I teach an agent to play a game based on visual inputs, to imitate a human playstyle.

My first impression so far was that I got better results with BC than gail.
If there's any way I can try out these changes I would be happy to do a new comparison and let you know how it works for my case, since I'm anyway going to spend some time optimizing this agent.

Here's a video about my project: https://www.linkedin.com/posts/robin-lindh-nilsson-7a30205b_ai-machinelearning-deeplearning-activity-6604987123860815872-rI1K

@ervteng
Copy link
Contributor Author

ervteng commented Nov 26, 2019

@robinerd thanks for chiming in! If there are indeed cases where BC beats GAIL clearly, it's a strong signal for us that we shouldn't remove BC. It's possible that GAIL is not as good with images as BC is.

You can already try these features! The current setup lets you combine BC and GAIL with PPO, only that BC is called "pretraining". You can set steps to 0 to have pretraining last continuously throughout your run. In fact, if you set the strengths of all the reward_signals to 0 and pretraining strength to 1.0, it should effectively become BC.

Another thing I'd play with is setting use_actions to true for GAIL. This should better force-mimic your playthroughs. It's less useful when combining GAIL with an RL signal, but for pure demonstrations it should work better.

Thanks again for giving these features a real shakedown - always looking for such feedback when deciding what to remove and what to add.

@robinerd
Copy link

robinerd commented Nov 26, 2019

@ervteng Thanks for your reply! I'm glad to try it out, you're all doing a great job with this :)
Right now I'm using only a gail reward signal with 1.0 strength, and I can't seem to get it to converge.

All actions are randomly fluctuating a lot while the same situation works fairly OK with offline_bc. Have experimented with use_actions and use_vail, and some different encoding sizes.

I will start a discussion issue (will put link here) with more info, maybe we can understand it better together. Maybe it doesn't work well with cameras, or maybe I'm just not using it right.

@robinerd
Copy link

Before creating any issue I will check if I can get good results uby using only the pretraining reward as you suggested.

I didn't realize that pretraining was essentially using behavioral cloning. I thought it was based on GAIL in some way too. And I was confused what the difference between pretraining and GAIL was. Then it all makes sense now 😄

Is it still interesting to dig into the reason why I don't get good results with GAIL then?

@ervteng
Copy link
Contributor Author

ervteng commented Nov 26, 2019

Hey @robinerd, this discussion is super helpful, let's still create the issue. Would like to make sure we don't lose any functionality if we remove offline BC, esp. with images.

One of the reasons for this PR is to get rid of that ambiguity of having several of the same technique in different places (and rename pretraining to BC).

A side note: I just fixed an issue with setting steps to 0 in pretraining, for now just set it to some large number to get it to run.

"seen" rewards, and allow it to optimize past the available demonstrations and/or generalize
outside of the provided demonstrations. `steps` corresponds to the training steps over which
pretraining is active. The learning rate of the pretrainer will anneal over the steps. Set
BC is active. The learning rate of the cloning will anneal over the steps. Set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The learning rate of the cloning behavioral cloning will anneal over the steps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to the abbreviation BC

demo_path: ./demos/ExpertPyramid.demo
strength: 0.5
steps: 10000
```

Below are the available hyperparameters for pretraining.
Below are the available hyperparameters for BC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Behavioral Cloning be abbreviated? I think it would be better to keep it consistent in the docs. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I abbreviated it since we abbreviate PPO, SAC, and GAIL. I think the 1st mention on any particular page should be full with the abbreviation in parenthesis, then abbreviated - what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@robinerd
Copy link

robinerd commented Dec 2, 2019

Hi, I didn't get the time to make an issue previously but I was able to get similar results with pretraining instead of offline_bc. I set the extrinsic reward strength to 0 and added my demo files in pretraining. So from the point of view of my bot with visual observations I don't see any issue of removing the offline_bc.

Regarding GAIL, I wasn't able to get it to converge. I just got seemingly random action all the time. Maybe it doesn't work good with visual observations, not sure.

Anyway, I can stick to the pretraining, and furthermore I think it's good you're renaming it to behavioral cloning. I was for a long time a bit confused about was pretraining was actually doing.

@ervteng
Copy link
Contributor Author

ervteng commented Dec 2, 2019

Hi @robinerd, thank you again for giving the pretraining a go and reporting your results! Have you tried combining GAIL and PreTraining to see if it will train?

There has been evidence in the literature that suggests GAIL works for visual obs better when combined with a (reward signal. Especially in your case, it's possible that the discriminator's neural network just isn't big enough to capture what "similar" means between two screens of the game. It's possible that the pretraining will give it enough of a direction to learn its way there.

@ervteng ervteng changed the title [WIP] Remove Offline BC Training Remove Offline BC Training Dec 3, 2019
@ervteng ervteng marked this pull request as ready for review December 3, 2019 02:19
@robinerd
Copy link

robinerd commented Dec 3, 2019

Thanks for the suggestion @ervteng, I haven't tried that. Sounds very interesting! I will probably give that a go :)

@chriselion chriselion closed this Dec 5, 2019
@chriselion chriselion reopened this Dec 6, 2019
@ervteng ervteng changed the base branch from develop to master December 6, 2019 19:26
@vincentpierre vincentpierre self-requested a review December 12, 2019 01:19
demo_path: ./demos/ExpertPyramid.demo
strength: 0.5
steps: 10000
```

Below are the available hyperparameters for pretraining.
Below are the available hyperparameters for BC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@ervteng ervteng merged commit 299b332 into master Dec 12, 2019
@delete-merged-branch delete-merged-branch bot deleted the develop-removebc branch December 12, 2019 01:45
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants