Skip to content

Commit e8388b5

Browse files
authored
Rename Generalization -> Environment Parameter Randomization (#3646)
* Rename generalization to Environment Parameter Randomization
1 parent fbadc7b commit e8388b5

7 files changed

+44
-47
lines changed

com.unity.ml-agents/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
1010

1111
### Minor Changes
1212
- Format of console output has changed slightly and now matches the name of the model/summary directory. (#3630, #3616)
13+
- Renamed 'Generalization' feature to 'Environment Parameter Randomization'.
1314

1415
## [0.15.0-preview] - 2020-03-18
1516
### Major Changes
File renamed without changes.

docs/ML-Agents-Overview.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -350,12 +350,11 @@ training process.
350350
learn more about adding visual observations to an agent
351351
[here](Learning-Environment-Design-Agents.md#multiple-visual-observations).
352352

353-
- **Training with Reset Parameter Sampling** - To train agents to be adapt
354-
to changes in its environment (i.e., generalization), the agent should be exposed
355-
to several variations of the environment. Similar to Curriculum Learning,
353+
- **Training with Environment Parameter Randomization** - If an agent is exposed to several variations of an environment, it will be more robust (i.e. generalize better) to
354+
unseen variations of the environment. Similar to Curriculum Learning,
356355
where environments become more difficult as the agent learns, the toolkit provides
357-
a way to randomly sample Reset Parameters of the environment during training. See
358-
[Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
356+
a way to randomly sample parameters of the environment during training. See
357+
[Training With Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md)
359358
to learn more about this feature.
360359

361360
- **Cloud Training on AWS** - To facilitate using the ML-Agents toolkit on

docs/Readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@
4040
* [Training with Curriculum Learning](Training-Curriculum-Learning.md)
4141
* [Training with Imitation Learning](Training-Imitation-Learning.md)
4242
* [Training with LSTM](Feature-Memory.md)
43-
* [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
43+
* [Training with Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md)
4444

4545
## Inference
4646

docs/Training-Curriculum-Learning.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,10 +93,10 @@ behavior has the following parameters:
9393
measure by previous values.
9494
* If `true`, weighting will be 0.75 (new) 0.25 (old).
9595
* `parameters` (dictionary of key:string, value:float array) - Corresponds to
96-
Academy reset parameters to control. Length of each array should be one
96+
Environment parameters to control. Length of each array should be one
9797
greater than number of thresholds.
9898

99-
Once our curriculum is defined, we have to use the reset parameters we defined
99+
Once our curriculum is defined, we have to use the environment parameters we defined
100100
and modify the environment from the Agent's `OnEpisodeBegin()` function. See
101101
[WallJumpAgent.cs](https://github.com/Unity-Technologies/ml-agents/blob/master/Project/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs)
102102
for an example.

docs/Training-Generalized-Reinforcement-Learning-Agents.md renamed to docs/Training-Environment-Parameter-Randomization.md

Lines changed: 34 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,48 @@
1-
# Training Generalized Reinforcement Learning Agents
1+
# Training With Environment Parameter Randomization
22

33
One of the challenges of training and testing agents on the same
44
environment is that the agents tend to overfit. The result is that the
55
agents are unable to generalize to any tweaks or variations in the environment.
66
This is analogous to a model being trained and tested on an identical dataset
77
in supervised learning. This becomes problematic in cases where environments
8-
are randomly instantiated with varying objects or properties.
8+
are instantiated with varying objects or properties.
99

10-
To make agents robust and generalizable to different environments, the agent
11-
should be trained over multiple variations of the environment. Using this approach
12-
for training, the agent will be better suited to adapt (with higher performance)
13-
to future unseen variations of the environment
10+
To help agents robust and better generalizable to changes in the environment, the agent
11+
can be trained over multiple variations of a given environment. We refer to this approach as **Environment Parameter Randomization**. For those familiar with Reinforcement Learning research, this approach is based on the concept of Domain Randomization (you can read more about it [here](https://arxiv.org/abs/1703.06907)). By using parameter randomization
12+
during training, the agent can be better suited to adapt (with higher performance)
13+
to future unseen variations of the environment.
1414

1515
_Example of variations of the 3D Ball environment._
1616

1717
Ball scale of 0.5 | Ball scale of 4
1818
:-------------------------:|:-------------------------:
1919
![](images/3dball_small.png) | ![](images/3dball_big.png)
2020

21-
## Introducing Generalization Using Reset Parameters
2221

23-
To enable variations in the environments, we implemented `Reset Parameters`.
24-
`Reset Parameters` are `Academy.Instance.FloatProperties` that are used only when
25-
resetting the environment. We
22+
To enable variations in the environments, we implemented `Environment Parameters`.
23+
`Environment Parameters` are `Academy.Instance.FloatProperties` that can be read when setting
24+
up the environment. We
2625
also included different sampling methods and the ability to create new kinds of
27-
sampling methods for each `Reset Parameter`. In the 3D ball environment example displayed
28-
in the figure above, the reset parameters are `gravity`, `ball_mass` and `ball_scale`.
26+
sampling methods for each `Environment Parameter`. In the 3D ball environment example displayed
27+
in the figure above, the environment parameters are `gravity`, `ball_mass` and `ball_scale`.
2928

3029

31-
## How to Enable Generalization Using Reset Parameters
30+
## How to Enable Environment Parameter Randomization
3231

33-
We first need to provide a way to modify the environment by supplying a set of `Reset Parameters`
32+
We first need to provide a way to modify the environment by supplying a set of `Environment Parameters`
3433
and vary them over time. This provision can be done either deterministically or randomly.
3534

36-
This is done by assigning each `Reset Parameter` a `sampler-type`(such as a uniform sampler),
37-
which determines how to sample a `Reset
35+
This is done by assigning each `Environment Parameter` a `sampler-type`(such as a uniform sampler),
36+
which determines how to sample an `Environment
3837
Parameter`. If a `sampler-type` isn't provided for a
39-
`Reset Parameter`, the parameter maintains the default value throughout the
40-
training procedure, remaining unchanged. The samplers for all the `Reset Parameters`
38+
`Environment Parameter`, the parameter maintains the default value throughout the
39+
training procedure, remaining unchanged. The samplers for all the `Environment Parameters`
4140
are handled by a **Sampler Manager**, which also handles the generation of new
42-
values for the reset parameters when needed.
41+
values for the environment parameters when needed.
4342

4443
To setup the Sampler Manager, we create a YAML file that specifies how we wish to
45-
generate new samples for each `Reset Parameters`. In this file, we specify the samplers and the
46-
`resampling-interval` (the number of simulation steps after which reset parameters are
44+
generate new samples for each `Environment Parameters`. In this file, we specify the samplers and the
45+
`resampling-interval` (the number of simulation steps after which environment parameters are
4746
resampled). Below is an example of a sampler file for the 3D ball environment.
4847

4948
```yaml
@@ -69,26 +68,25 @@ Below is the explanation of the fields in the above example.
6968

7069
* `resampling-interval` - Specifies the number of steps for the agent to
7170
train under a particular environment configuration before resetting the
72-
environment with a new sample of `Reset Parameters`.
71+
environment with a new sample of `Environment Parameters`.
7372

74-
* `Reset Parameter` - Name of the `Reset Parameter` like `mass`, `gravity` and `scale`. This should match the name
75-
specified in the academy of the intended environment for which the agent is
76-
being trained. If a parameter specified in the file doesn't exist in the
77-
environment, then this parameter will be ignored. Within each `Reset Parameter`
73+
* `Environment Parameter` - Name of the `Environment Parameter` like `mass`, `gravity` and `scale`. This should match the name
74+
specified in the `FloatProperties` of the environment being trained. If a parameter specified in the file doesn't exist in the
75+
environment, then this parameter will be ignored. Within each `Environment Parameter`
7876

79-
* `sampler-type` - Specify the sampler type to use for the `Reset Parameter`.
77+
* `sampler-type` - Specify the sampler type to use for the `Environment Parameter`.
8078
This is a string that should exist in the `Sampler Factory` (explained
8179
below).
8280

8381
* `sampler-type-sub-arguments` - Specify the sub-arguments depending on the `sampler-type`.
8482
In the example above, this would correspond to the `intervals`
85-
under the `sampler-type` `"multirange_uniform"` for the `Reset Parameter` called `gravity`.
83+
under the `sampler-type` `"multirange_uniform"` for the `Environment Parameter` called `gravity`.
8684
The key name should match the name of the corresponding argument in the sampler definition.
8785
(See below)
8886

89-
The Sampler Manager allocates a sampler type for each `Reset Parameter` by using the *Sampler Factory*,
87+
The Sampler Manager allocates a sampler type for each `Environment Parameter` by using the *Sampler Factory*,
9088
which maintains a dictionary mapping of string keys to sampler objects. The available sampler types
91-
to be used for each `Reset Parameter` is available in the Sampler Factory.
89+
to be used for each `Environment Parameter` is available in the Sampler Factory.
9290

9391
### Included Sampler Types
9492

@@ -134,7 +132,7 @@ is as follows:
134132
`SamplerFactory.register_sampler(*custom_sampler_string_key*, *custom_sampler_object*)`
135133

136134
Once the Sampler Factory reflects the new register, the new sampler type can be used for sample any
137-
`Reset Parameter`. For example, lets say a new sampler type was implemented as below and we register
135+
`Environment Parameter`. For example, lets say a new sampler type was implemented as below and we register
138136
the `CustomSampler` class with the string `custom-sampler` in the Sampler Factory.
139137

140138
```python
@@ -148,7 +146,7 @@ class CustomSampler(Sampler):
148146
```
149147

150148
Now we need to specify the new sampler type in the sampler YAML file. For example, we use this new
151-
sampler type for the `Reset Parameter` *mass*.
149+
sampler type for the `Environment Parameter` *mass*.
152150

153151
```yaml
154152
mass:
@@ -158,16 +156,16 @@ mass:
158156
argC: 3
159157
```
160158
161-
### Training with Generalization Using Reset Parameters
159+
### Training with Environment Parameter Randomization
162160
163161
After the sampler YAML file is defined, we proceed by launching `mlagents-learn` and specify
164162
our configured sampler file with the `--sampler` flag. For example, if we wanted to train the
165-
3D ball agent with generalization using `Reset Parameters` with `config/3dball_generalize.yaml`
163+
3D ball agent with parameter randomization using `Environment Parameters` with `config/3dball_randomize.yaml`
166164
sampling setup, we would run
167165

168166
```sh
169-
mlagents-learn config/trainer_config.yaml --sampler=config/3dball_generalize.yaml
170-
--run-id=3D-Ball-generalization --train
167+
mlagents-learn config/trainer_config.yaml --sampler=config/3dball_randomize.yaml
168+
--run-id=3D-Ball-randomize --train
171169
```
172170

173171
We can observe progress and metrics via Tensorboard.

docs/Training-ML-Agents.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -106,8 +106,7 @@ environment, you can set the following command line options when invoking
106106
lessons for curriculum training. See [Curriculum
107107
Training](Training-Curriculum-Learning.md) for more information.
108108
* `--sampler=<file>`: Specify a sampler YAML file for defining the
109-
sampler for generalization training. See [Generalization
110-
Training](Training-Generalized-Reinforcement-Learning-Agents.md) for more information.
109+
sampler for parameter randomization. See [Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md) for more information.
111110
* `--keep-checkpoints=<n>`: Specify the maximum number of model checkpoints to
112111
keep. Checkpoints are saved after the number of steps specified by the
113112
`save-freq` option. Once the maximum number of checkpoints has been reached,
@@ -218,7 +217,7 @@ are conducting, see:
218217
* [Using Recurrent Neural Networks](Feature-Memory.md)
219218
* [Training with Curriculum Learning](Training-Curriculum-Learning.md)
220219
* [Training with Imitation Learning](Training-Imitation-Learning.md)
221-
* [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
220+
* [Training with Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md)
222221

223222
You can also compare the
224223
[example environments](Learning-Environment-Examples.md)

0 commit comments

Comments
 (0)