You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/Migrating.md
+2Lines changed: 2 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -16,6 +16,8 @@ The versions can be found in
16
16
*`reset()` on the Low-Level Python API no longer takes a `config` argument. `UnityEnvironment` no longer has a `reset_parameters` field. To modify float properties in the environment, you must use a `FloatPropertiesChannel`. For more information, refer to the [Low Level Python API documentation](Python-API.md)
17
17
* The Academy no longer has a `Training Configuration` nor `Inference Configuration` field in the inspector. To modify the configuration from the Low-Level Python API, use an `EngineConfigurationChannel`. To modify it during training, use the new command line arguments `--width`, `--height`, `--quality-level`, `--time-scale` and `--target-frame-rate` in `mlagents-learn`.
18
18
* The Academy no longer has a `Default Reset Parameters` field in the inspector. The Academy class no longer has a `ResetParameters`. To access shared float properties with Python, use the new `FloatProperties` field on the Academy.
19
+
* Offline Behavioral Cloning has been removed. To learn from demonstrations, use the GAIL and
20
+
Behavioral Cloning features with either PPO or SAC. See [Imitation Learning](Training-Imitation-Learning.md) for more information.
19
21
20
22
### Steps to Migrate
21
23
* If you had a custom `Training Configuration` in the Academy inspector, you will need to pass your custom configuration at every training run using the new command line arguments `--width`, `--height`, `--quality-level`, `--time-scale` and `--target-frame-rate`.
on the PPO trainer, in addition to using a small GAIL reward signal.
39
-
* To train an agent to exactly mimic demonstrations, you can use the
40
-
[Behavioral Cloning](Training-Behavioral-Cloning.md) trainer. Behavioral Cloning can be
41
-
used with demonstrations (in-editor), and learns very quickly. However, it usually is ineffective
42
-
on more complex environments without a large number of demonstrations.
38
+
* Behavioral Cloning (BC) trains the Agent's neural network to exactly mimic the actions
39
+
shown in a set of demonstrations.
40
+
[The BC feature](Training-PPO.md#optional-behavioral-cloning-using-demonstrations)
41
+
can be enabled on the PPO or SAC trainer. BC tends to work best when
42
+
there are a lot of demonstrations, or in conjunction with GAIL and/or an extrinsic reward.
43
43
44
44
### How to Choose
45
45
46
46
If you want to help your agents learn (especially with environments that have sparse rewards)
47
-
using pre-recorded demonstrations, you can generally enable both GAIL and Pretraining.
47
+
using pre-recorded demonstrations, you can generally enable both GAIL and Behavioral Cloning
48
+
at low strengths in addition to having an extrinsic reward.
48
49
An example of this is provided for the Pyramids example environment under
49
50
`PyramidsLearning` in `config/gail_config.yaml`.
50
51
51
-
If you want to train purely from demonstrations, GAIL is generally the preferred approach, especially
52
-
if you have few (<10) episodes of demonstrations. An example of this is provided for the Crawler example
53
-
environment under `CrawlerStaticLearning` in `config/gail_config.yaml`.
54
-
55
-
If you have plenty of demonstrations and/or a very simple environment, Offline Behavioral Cloning can be effective and quick. However, it cannot be combined with RL.
52
+
If you want to train purely from demonstrations, GAIL and BC _without_ an
53
+
extrinsic reward signal is the preferred approach. An example of this is provided for the Crawler
54
+
example environment under `CrawlerStaticLearning` in `config/gail_config.yaml`.
56
55
57
56
## Recording Demonstrations
58
57
59
58
It is possible to record demonstrations of agent behavior from the Unity Editor,
60
59
and save them as assets. These demonstrations contain information on the
61
60
observations, actions, and rewards for a given agent during the recording session.
62
-
They can be managed from the Editor, as well as used for training with Offline
63
-
Behavioral Cloning and GAIL.
61
+
They can be managed from the Editor, as well as used for training with BC and GAIL.
64
62
65
63
In order to record demonstrations from an agent, add the `Demonstration Recorder`
66
64
component to a GameObject in the scene which contains an `Agent` component.
0 commit comments