Skip to content

Commit 299b332

Browse files
author
Ervin T
authored
Remove Standalone Offline BC Training (#2969)
1 parent aa69e77 commit 299b332

24 files changed

+105
-834
lines changed

config/gail_config.yaml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Pyramids:
3131
beta: 1.0e-2
3232
max_steps: 5.0e5
3333
num_epoch: 3
34-
pretraining:
34+
behavioral_cloning:
3535
demo_path: ./demos/ExpertPyramid.demo
3636
strength: 0.5
3737
steps: 10000
@@ -59,6 +59,10 @@ CrawlerStatic:
5959
summary_freq: 3000
6060
num_layers: 3
6161
hidden_units: 512
62+
behavioral_cloning:
63+
demo_path: ./demos/ExpertCrawlerSta.demo
64+
strength: 0.5
65+
steps: 5000
6266
reward_signals:
6367
gail:
6468
strength: 1.0

docs/Migrating.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ The versions can be found in
1616
* `reset()` on the Low-Level Python API no longer takes a `config` argument. `UnityEnvironment` no longer has a `reset_parameters` field. To modify float properties in the environment, you must use a `FloatPropertiesChannel`. For more information, refer to the [Low Level Python API documentation](Python-API.md)
1717
* The Academy no longer has a `Training Configuration` nor `Inference Configuration` field in the inspector. To modify the configuration from the Low-Level Python API, use an `EngineConfigurationChannel`. To modify it during training, use the new command line arguments `--width`, `--height`, `--quality-level`, `--time-scale` and `--target-frame-rate` in `mlagents-learn`.
1818
* The Academy no longer has a `Default Reset Parameters` field in the inspector. The Academy class no longer has a `ResetParameters`. To access shared float properties with Python, use the new `FloatProperties` field on the Academy.
19+
* Offline Behavioral Cloning has been removed. To learn from demonstrations, use the GAIL and
20+
Behavioral Cloning features with either PPO or SAC. See [Imitation Learning](Training-Imitation-Learning.md) for more information.
1921

2022
### Steps to Migrate
2123
* If you had a custom `Training Configuration` in the Academy inspector, you will need to pass your custom configuration at every training run using the new command line arguments `--width`, `--height`, `--quality-level`, `--time-scale` and `--target-frame-rate`.

docs/Reward-Signals.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -135,11 +135,10 @@ discriminator is trained to better distinguish between demonstrations and agent
135135
In this way, while the agent gets better and better at mimicing the demonstrations, the
136136
discriminator keeps getting stricter and stricter and the agent must try harder to "fool" it.
137137

138-
This approach, when compared to [Behavioral Cloning](Training-Behavioral-Cloning.md), requires
139-
far fewer demonstrations to be provided. After all, we are still learning a policy that happens
140-
to be similar to the demonstrations, not directly copying the behavior of the demonstrations. It
141-
is especially effective when combined with an Extrinsic signal. However, the GAIL reward signal can
142-
also be used independently to purely learn from demonstrations.
138+
This approach learns a _policy_ that produces states and actions similar to the demonstrations,
139+
requiring fewer demonstrations than direct cloning of the actions. In addition to learning purely
140+
from demonstrations, the GAIL reward signal can be mixed with an extrinsic reward signal to guide
141+
the learning process.
143142

144143
Using GAIL requires recorded demonstrations from your Unity environment. See the
145144
[imitation learning guide](Training-Imitation-Learning.md) to learn more about recording demonstrations.

docs/Training-Behavioral-Cloning.md

Lines changed: 0 additions & 30 deletions
This file was deleted.

docs/Training-Imitation-Learning.md

Lines changed: 16 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -19,48 +19,46 @@ imitation learning combined with reinforcement learning can dramatically
1919
reduce the time the agent takes to solve the environment.
2020
For instance, on the [Pyramids environment](Learning-Environment-Examples.md#pyramids),
2121
using 6 episodes of demonstrations can reduce training steps by more than 4 times.
22-
See PreTraining + GAIL + Curiosity + RL below.
22+
See Behavioral Cloning + GAIL + Curiosity + RL below.
2323

2424
<p align="center">
2525
<img src="images/mlagents-ImitationAndRL.png"
2626
alt="Using Demonstrations with Reinforcement Learning"
2727
width="700" border="0" />
2828
</p>
2929

30-
The ML-Agents toolkit provides several ways to learn from demonstrations.
30+
The ML-Agents toolkit provides two features that enable your agent to learn from demonstrations.
31+
In most scenarios, you should combine these two features
3132

32-
* To train using GAIL (Generative Adversarial Imitation Learning) you can add the
33+
* GAIL (Generative Adversarial Imitation Learning) uses an adversarial approach to
34+
reward your Agent for behaving similar to a set of demonstrations. To use GAIL, you can add the
3335
[GAIL reward signal](Reward-Signals.md#gail-reward-signal). GAIL can be
3436
used with or without environment rewards, and works well when there are a limited
3537
number of demonstrations.
36-
* To help bootstrap reinforcement learning, you can enable
37-
[pretraining](Training-PPO.md#optional-pretraining-using-demonstrations)
38-
on the PPO trainer, in addition to using a small GAIL reward signal.
39-
* To train an agent to exactly mimic demonstrations, you can use the
40-
[Behavioral Cloning](Training-Behavioral-Cloning.md) trainer. Behavioral Cloning can be
41-
used with demonstrations (in-editor), and learns very quickly. However, it usually is ineffective
42-
on more complex environments without a large number of demonstrations.
38+
* Behavioral Cloning (BC) trains the Agent's neural network to exactly mimic the actions
39+
shown in a set of demonstrations.
40+
[The BC feature](Training-PPO.md#optional-behavioral-cloning-using-demonstrations)
41+
can be enabled on the PPO or SAC trainer. BC tends to work best when
42+
there are a lot of demonstrations, or in conjunction with GAIL and/or an extrinsic reward.
4343

4444
### How to Choose
4545

4646
If you want to help your agents learn (especially with environments that have sparse rewards)
47-
using pre-recorded demonstrations, you can generally enable both GAIL and Pretraining.
47+
using pre-recorded demonstrations, you can generally enable both GAIL and Behavioral Cloning
48+
at low strengths in addition to having an extrinsic reward.
4849
An example of this is provided for the Pyramids example environment under
4950
`PyramidsLearning` in `config/gail_config.yaml`.
5051

51-
If you want to train purely from demonstrations, GAIL is generally the preferred approach, especially
52-
if you have few (<10) episodes of demonstrations. An example of this is provided for the Crawler example
53-
environment under `CrawlerStaticLearning` in `config/gail_config.yaml`.
54-
55-
If you have plenty of demonstrations and/or a very simple environment, Offline Behavioral Cloning can be effective and quick. However, it cannot be combined with RL.
52+
If you want to train purely from demonstrations, GAIL and BC _without_ an
53+
extrinsic reward signal is the preferred approach. An example of this is provided for the Crawler
54+
example environment under `CrawlerStaticLearning` in `config/gail_config.yaml`.
5655

5756
## Recording Demonstrations
5857

5958
It is possible to record demonstrations of agent behavior from the Unity Editor,
6059
and save them as assets. These demonstrations contain information on the
6160
observations, actions, and rewards for a given agent during the recording session.
62-
They can be managed from the Editor, as well as used for training with Offline
63-
Behavioral Cloning and GAIL.
61+
They can be managed from the Editor, as well as used for training with BC and GAIL.
6462

6563
In order to record demonstrations from an agent, add the `Demonstration Recorder`
6664
component to a GameObject in the scene which contains an `Agent` component.

0 commit comments

Comments
 (0)