You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Training with Reset Parameter Sampling** - To train agents to be adapt
354
-
to changes in its environment (i.e., generalization), the agent should be exposed
355
-
to several variations of the environment. Similar to Curriculum Learning,
353
+
-**Training with Environment Parameter Randomization** - If an agent is exposed to several variations of an environment, it will be more robust (i.e. generalize better) to
354
+
unseen variations of the environment. Similar to Curriculum Learning,
356
355
where environments become more difficult as the agent learns, the toolkit provides
357
-
a way to randomly sample Reset Parameters of the environment during training. See
Copy file name to clipboardExpand all lines: docs/Training-Environment-Parameter-Randomization.md
+34-36Lines changed: 34 additions & 36 deletions
Original file line number
Diff line number
Diff line change
@@ -1,49 +1,48 @@
1
-
# Training Generalized Reinforcement Learning Agents
1
+
# Training With Environment Parameter Randomization
2
2
3
3
One of the challenges of training and testing agents on the same
4
4
environment is that the agents tend to overfit. The result is that the
5
5
agents are unable to generalize to any tweaks or variations in the environment.
6
6
This is analogous to a model being trained and tested on an identical dataset
7
7
in supervised learning. This becomes problematic in cases where environments
8
-
are randomly instantiated with varying objects or properties.
8
+
are instantiated with varying objects or properties.
9
9
10
-
To make agents robust and generalizable to different environments, the agent
11
-
should be trained over multiple variations of the environment. Using this approach
12
-
for training, the agent will be better suited to adapt (with higher performance)
13
-
to future unseen variations of the environment
10
+
To help agents robust and better generalizable to changes in the environment, the agent
11
+
can be trained over multiple variations of a given environment. We refer to this approach as **Environment Parameter Randomization**. For those familiar with Reinforcement Learning research, this approach is based on the concept of Domain Randomization (you can read more about it [here](https://arxiv.org/abs/1703.06907)). By using parameter randomization
12
+
during training, the agent can be better suited to adapt (with higher performance)
13
+
to future unseen variations of the environment.
14
14
15
15
_Example of variations of the 3D Ball environment._
0 commit comments