-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Remove Offline BC Training #2969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -22,7 +22,7 @@ def __init__( | |||
samples_per_update: int = 0, | |||
): | |||
""" | |||
A BC trainer that can be used inline with RL, especially for pretraining. | |||
A BC trainer that can be used inline with RL, especially for behavioral_cloning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this a find-and-replace? sounds awkward here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this, as it no longer makes sense
@@ -97,11 +96,7 @@ def initialize_trainer( | |||
trainer_parameters.update(trainer_config[_brain_key]) | |||
|
|||
trainer = None | |||
if trainer_parameters["trainer"] == "offline_bc": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave this in and raise an error message with upgrade instructions, at least for one release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added exception message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd say "removed", not "deprecated". To me, deprecated means "going to be removed in the future, but still around for now"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to removed
@@ -1,227 +0,0 @@ | |||
import unittest.mock as mock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just make sure that coverage in non-deleted files didn't go down? I can look at the coverage reports with you tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good - from my run it's hovering around 82%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More precise numbers - with current develop, we have 82.6% and with this branch it's at 82.4%.
Seems to be a 1-line drop on bc_module's model.py
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forgot to follow up on this. That seems like a reasonable change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still investigating the 1-line, since model.py
wasn't changed but pytest claims there's 31 (?) lines in the file vs. 30 before.
edit seems to be a cov glitch and not a real issue. All the lines show up as covered in the report.
docs/Reward-Signals.md
Outdated
@@ -135,7 +135,7 @@ discriminator is trained to better distinguish between demonstrations and agent | |||
In this way, while the agent gets better and better at mimicing the demonstrations, the | |||
discriminator keeps getting stricter and stricter and the agent must try harder to "fool" it. | |||
|
|||
This approach, when compared to [Behavioral Cloning](Training-Behavioral-Cloning.md), requires | |||
This approach, when compared to Behavioral Cloning, requires |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe link to the paper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe just shorten the paragraph to "This approach is especially effective when combined with an Extrinsic signal..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this paragraph was pretty awful... I revised it a bit
@@ -19,40 +19,39 @@ imitation learning combined with reinforcement learning can dramatically | |||
reduce the time the agent takes to solve the environment. | |||
For instance, on the [Pyramids environment](Learning-Environment-Examples.md#pyramids), | |||
using 6 episodes of demonstrations can reduce training steps by more than 4 times. | |||
See PreTraining + GAIL + Curiosity + RL below. | |||
See Behavioral Cloning + GAIL + Curiosity + RL below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that you'll need to change the legend in the linked image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
I'm ambivalent to the pretraining->BC rename. Deferring to folks who are more familiar with it. |
Interesting, I am right now comparing gail and offline_bc in a project where I teach an agent to play a game based on visual inputs, to imitate a human playstyle. My first impression so far was that I got better results with BC than gail. Here's a video about my project: https://www.linkedin.com/posts/robin-lindh-nilsson-7a30205b_ai-machinelearning-deeplearning-activity-6604987123860815872-rI1K |
@robinerd thanks for chiming in! If there are indeed cases where BC beats GAIL clearly, it's a strong signal for us that we shouldn't remove BC. It's possible that GAIL is not as good with images as BC is. You can already try these features! The current setup lets you combine BC and GAIL with PPO, only that BC is called "pretraining". You can set Another thing I'd play with is setting Thanks again for giving these features a real shakedown - always looking for such feedback when deciding what to remove and what to add. |
@ervteng Thanks for your reply! I'm glad to try it out, you're all doing a great job with this :) All actions are randomly fluctuating a lot while the same situation works fairly OK with offline_bc. Have experimented with use_actions and use_vail, and some different encoding sizes. I will start a discussion issue (will put link here) with more info, maybe we can understand it better together. Maybe it doesn't work well with cameras, or maybe I'm just not using it right. |
Before creating any issue I will check if I can get good results uby using only the pretraining reward as you suggested. I didn't realize that pretraining was essentially using behavioral cloning. I thought it was based on GAIL in some way too. And I was confused what the difference between pretraining and GAIL was. Then it all makes sense now 😄 Is it still interesting to dig into the reason why I don't get good results with GAIL then? |
Hey @robinerd, this discussion is super helpful, let's still create the issue. Would like to make sure we don't lose any functionality if we remove offline BC, esp. with images. One of the reasons for this PR is to get rid of that ambiguity of having several of the same technique in different places (and rename pretraining to BC). A side note: I just fixed an issue with setting |
docs/Training-PPO.md
Outdated
"seen" rewards, and allow it to optimize past the available demonstrations and/or generalize | ||
outside of the provided demonstrations. `steps` corresponds to the training steps over which | ||
pretraining is active. The learning rate of the pretrainer will anneal over the steps. Set | ||
BC is active. The learning rate of the cloning will anneal over the steps. Set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The learning rate of the cloning behavioral cloning will anneal over the steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to the abbreviation BC
demo_path: ./demos/ExpertPyramid.demo | ||
strength: 0.5 | ||
steps: 10000 | ||
``` | ||
|
||
Below are the available hyperparameters for pretraining. | ||
Below are the available hyperparameters for BC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should Behavioral Cloning be abbreviated? I think it would be better to keep it consistent in the docs. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I abbreviated it since we abbreviate PPO, SAC, and GAIL. I think the 1st mention on any particular page should be full with the abbreviation in parenthesis, then abbreviated - what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Hi, I didn't get the time to make an issue previously but I was able to get similar results with pretraining instead of offline_bc. I set the extrinsic reward strength to 0 and added my demo files in pretraining. So from the point of view of my bot with visual observations I don't see any issue of removing the offline_bc. Regarding GAIL, I wasn't able to get it to converge. I just got seemingly random action all the time. Maybe it doesn't work good with visual observations, not sure. Anyway, I can stick to the pretraining, and furthermore I think it's good you're renaming it to behavioral cloning. I was for a long time a bit confused about was pretraining was actually doing. |
Hi @robinerd, thank you again for giving the pretraining a go and reporting your results! Have you tried combining GAIL and PreTraining to see if it will train? There has been evidence in the literature that suggests GAIL works for visual obs better when combined with a (reward signal. Especially in your case, it's possible that the discriminator's neural network just isn't big enough to capture what "similar" means between two screens of the game. It's possible that the pretraining will give it enough of a direction to learn its way there. |
Thanks for the suggestion @ervteng, I haven't tried that. Sounds very interesting! I will probably give that a go :) |
demo_path: ./demos/ExpertPyramid.demo | ||
strength: 0.5 | ||
steps: 10000 | ||
``` | ||
|
||
Below are the available hyperparameters for pretraining. | ||
Below are the available hyperparameters for BC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
The functionality of the offline trainer is duplicated in the pretraining module; furthermore, GAIL usually produces better results. This PR: