You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* initial changes to the documentation
* More documentation changes, not done.
* More documentation changes
* More docs
* Changed the images
* addressing comments
* Adding one line to the migrating doc
Copy file name to clipboardExpand all lines: docs/Basic-Guide.md
+22-31Lines changed: 22 additions & 31 deletions
Original file line number
Diff line number
Diff line change
@@ -35,26 +35,20 @@ inside Unity. In this section, we will use the pre-trained model for the
35
35
1. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Scenes` folder
36
36
and open the `3DBall` scene file.
37
37
2. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Prefabs` folder.
38
-
Expand `Game` and click on the `Platform` prefab. You should see the `Platform` prefab in the **Inspector** window.
38
+
Expand `3DBall` and click on the `Agent` prefab. You should see the `Agent` prefab in the **Inspector** window.
39
39
40
-
**Note**: The platforms in the `3DBall` scene were created using the `Platform` prefab. Instead of updating all 12 platforms individually, you can update the `Platform` prefab instead.
40
+
**Note**: The platforms in the `3DBall` scene were created using the `3DBall` prefab. Instead of updating all 12 platforms individually, you can update the `3DBall` prefab instead.
41
41
42
42

43
43
44
-
3. In the **Project** window, drag the **3DBallLearning**Brain located in
45
-
`Assets/ML-Agents/Examples/3DBall/Brains` into the `Brain` property under `Ball 3D Agent (Script)` component in the **Inspector** window.
44
+
3. In the **Project** window, drag the **3DBallLearning**Model located in
45
+
`Assets/ML-Agents/Examples/3DBall/TFModels` into the `Model` property under `Ball 3D Agent (Script)` component in the **Inspector** window.
4. You should notice that each `Platform` under each `Game` in the **Hierarchy** windows now contains **3DBallLearning** as `Brain`. __Note__ : You can modify multiple game objects in a scene by selecting them all at
49
+
4. You should notice that each `Agent` under each `3DBall` in the **Hierarchy** windows now contains **3DBallLearning** as `Model`. __Note__ : You can modify multiple game objects in a scene by selecting them all at
50
50
once using the search bar in the Scene Hierarchy.
51
-
5. In the **Project** window, click on the **3DBallLearning** Brain located in
52
-
`Assets/ML-Agents/Examples/3DBall/Brains`. You should see the properties in the **Inspector** window.
53
-
6. In the **Project** window, open the `Assets/ML-Agents/Examples/3DBall/TFModels`
54
-
folder.
55
-
7. Drag the `3DBallLearning` model file from the `Assets/ML-Agents/Examples/3DBall/TFModels`
56
-
folder to the **Model** field of the **3DBallLearning** Brain in the **Inspector** window. __Note__ : All of the brains should now have `3DBallLearning` as the TensorFlow model in the `Model` property
57
-
8. Select the **InferenceDevice** to use for this model (CPU or GPU).
51
+
8. Select the **InferenceDevice** to use for this model (CPU or GPU) on the Agent.
58
52
_Note: CPU is faster for the majority of ML-Agents toolkit generated models_
59
53
9. Click the **Play** button and you will see the platforms balance the balls
60
54
using the pre-trained model.
@@ -73,22 +67,19 @@ if you want to [use an executable](Learning-Environment-Executable.md) or to
73
67
More information and documentation is provided in the
74
68
[Python API](Python-API.md) page.
75
69
76
-
## Training the Brain with Reinforcement Learning
70
+
## Training the Model with Reinforcement Learning
77
71
78
72
### Setting up the environment for training
79
73
80
-
To set up the environment for training, you will need to specify which agents are contributing
81
-
to the training and which Brain is being trained. You can only perform training with
82
-
a `Learning Brain`.
83
-
84
-
Each platform agent needs an assigned `Learning Brain`. In this example, each platform agent was created using a prefab. To update all of the brains in each platform agent at once, you only need to update the platform agent prefab. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Prefabs` folder. Expand `Game` and click on the `Platform` prefab. You should see the `Platform` prefab in the **Inspector** window. In the **Project** window, drag the **3DBallLearning** Brain located in `Assets/ML-Agents/Examples/3DBall/Brains` into the `Brain` property under `Ball 3D Agent (Script)` component in the **Inspector** window.
85
-
86
-
**Note**: The Unity prefab system will modify all instances of the agent properties in your scene. If the agent does not synchronize automatically with the prefab, you can hit the Revert button in the top of the **Inspector** window.
87
-
88
-
**Note:** Assigning a Brain to an agent (dragging a Brain into the `Brain` property of
89
-
90
-
the agent) means that the Brain will be making decision for that agent. If the Agent uses a
91
-
LearningBrain either Python controls the Brain or the model on the Brain does.
74
+
In order to setup the Agents for Training, you will need to edit the
75
+
`Behavior Name` under `BehaviorParamters` in the Agent Inspector window.
76
+
The `Behavior Name` is used to group agents per behaviors. Note that Agents
77
+
sharing the same `Behavior Name` must be agents of the same type using the
78
+
same `Behavior Parameters`. You can make sure all your agents have the same
79
+
`Behavior Parameters` using Prefabs.
80
+
The `Behavior Name` corresponds to the name of the model that will be
81
+
generated by the training process and is used to select the hyperparameters
@@ -45,7 +45,7 @@ window. The Inspector shows every component on a GameObject.
45
45
46
46
The first thing you may notice after opening the 3D Balance Ball scene is that
47
47
it contains not one, but several agent cubes. Each agent cube in the scene is an
48
-
independent agent, but they all share the same Brain. 3D Balance Ball does this
48
+
independent agent, but they all share the same Behavior. 3D Balance Ball does this
49
49
to speed up training since all twelve agents contribute to training in parallel.
50
50
51
51
### Academy
@@ -82,68 +82,16 @@ The 3D Balance Ball environment does not use these functions — each Agent rese
82
82
itself when needed — but many environments do use these functions to control the
83
83
environment around the Agents.
84
84
85
-
### Brain
86
-
87
-
As of v0.6, a Brain is a Unity asset and exists within the `UnitySDK` folder. These brains (ex. **3DBallLearning.asset**) are loaded into each Agent object (ex. **Ball3DAgents**). A Brain doesn't store any information about an Agent, it just
88
-
routes the Agent's collected observations to the decision making process and
89
-
returns the chosen action to the Agent. All Agents can share the same
90
-
Brain, but would act independently. The Brain settings tell you quite a bit about how
91
-
an Agent works.
92
-
93
-
You can create new Brain assets by selecting `Assets ->
94
-
Create -> ML-Agents -> Brain`. There are 3 types of Brains.
95
-
The **Learning Brain** is a Brain that uses a trained neural network to make decisions.
96
-
When Unity is connected to Python, the external process will be controlling the Brain.
97
-
The external process that is training the neural network will take over decision making for the agents
98
-
and ultimately generate a trained neural network. You can also use the
99
-
**Learning Brain** with a pre-trained model.
100
-
The **Heuristic** Brain allows you to hand-code the Agent logic by extending
101
-
the Decision class.
102
-
Finally, the **Player** Brain lets you map keyboard commands to actions, which
103
-
can be useful when testing your agents and environment. You can also implement your own type of Brain.
104
-
105
-
In this tutorial, you will use the **Learning Brain** for training.
106
-
107
-
#### Vector Observation Space
108
-
109
-
Before making a decision, an agent collects its observation about its state in
110
-
the world. The vector observation is a vector of floating point numbers which
111
-
contain relevant information for the agent to make decisions.
112
-
113
-
The Brain instance used in the 3D Balance Ball example uses the **Continuous**
114
-
vector observation space with a **State Size** of 8. This means that the feature
115
-
vector containing the Agent's observations contains eight elements: the `x` and
116
-
`z` components of the agent cube's rotation and the `x`, `y`, and `z` components
117
-
of the ball's relative position and velocity. (The observation values are
118
-
defined in the Agent's `CollectObservations()` function.)
119
-
120
-
#### Vector Action Space
121
-
122
-
An Agent is given instructions from the Brain in the form of *actions*.
123
-
ML-Agents toolkit classifies actions into two types: the **Continuous** vector
124
-
action space is a vector of numbers that can vary continuously. What each
125
-
element of the vector means is defined by the Agent logic (the PPO training
126
-
process just learns what values are better given particular state observations
127
-
based on the rewards received when it tries different values). For example, an
128
-
element might represent a force or torque applied to a `Rigidbody` in the Agent.
129
-
The **Discrete** action vector space defines its actions as tables. An action
130
-
given to the Agent is an array of indices into tables.
131
-
132
-
The 3D Balance Ball example is programmed to use both types of vector action
133
-
space. You can try training with both settings to observe whether there is a
134
-
difference. (Set the `Vector Action Space Size` to 4 when using the discrete
135
-
action space and 2 when using continuous.)
136
-
137
85
### Agent
138
86
139
87
The Agent is the actor that observes and takes actions in the environment. In
140
88
the 3D Balance Ball environment, the Agent components are placed on the twelve
141
89
"Agent" GameObjects. The base Agent object has a few properties that affect its
142
90
behavior:
143
91
144
-
***Brain** — Every Agent must have a Brain. The Brain determines how an Agent
145
-
makes decisions. All the Agents in the 3D Balance Ball scene share the same
146
-
Brain.
92
+
***Behavior Parameters** — Every Agent must have a Behavior. The Behavior
93
+
determines how an Agent makes decisions. More on Behavior Parameters in
94
+
the next section.
147
95
***Visual Observations** — Defines any Camera objects used by the Agent to
148
96
observe its environment. 3D Balance Ball does not use camera observations.
149
97
***Max Step** — Defines how many simulation steps can occur before the Agent
@@ -162,22 +110,54 @@ The Ball3DAgent subclass defines the following methods:
162
110
training generalizes to more than a specific starting position and agent cube
163
111
attitude.
164
112
* agent.CollectObservations() — Called every simulation step. Responsible for
165
-
collecting the Agent's observations of the environment. Since the Brain
166
-
instance assigned to the Agent is set to the continuous vector observation
113
+
collecting the Agent's observations of the environment. Since the Behavior
114
+
Parameters of the Agent are set with vector observation
167
115
space with a state size of 8, the `CollectObservations()` must call
168
-
`AddVectorObs` such that vector size adds up to 8.
116
+
`AddVectorObs` such that vector size adds up to 8.
169
117
* agent.AgentAction() — Called every simulation step. Receives the action chosen
170
-
by the Brain. The Ball3DAgent example handles both the continuous and the
171
-
discrete action space types. There isn't actually much difference between the
172
-
two state types in this environment — both vector action spaces result in a
118
+
by the Agent. The vector action spaces result in a
173
119
small change in the agent cube's rotation at each step. The `AgentAction()` function
174
120
assigns a reward to the Agent; in this example, an Agent receives a small
175
121
positive reward for each step it keeps the ball on the agent cube's head and a larger,
176
122
negative reward for dropping the ball. An Agent is also marked as done when it
177
123
drops the ball so that it will reset with a new ball for the next simulation
178
124
step.
125
+
* agent.Heuristic() - When the `Use Heuristic` checkbox is checked in the Behavior
126
+
Parameters of the Agent, the Agent will use the `Heuristic()` method to generate
127
+
the actions of the Agent. As such, the `Heuristic()` method returns an array of
128
+
floats. In the case of the Ball 3D Agent, the `Heuristic()` method converts the
129
+
keyboard inputs into actions.
130
+
131
+
132
+
#### Behavior Parameters : Vector Observation Space
133
+
134
+
Before making a decision, an agent collects its observation about its state in
135
+
the world. The vector observation is a vector of floating point numbers which
136
+
contain relevant information for the agent to make decisions.
137
+
138
+
The Behavior Parameters of the 3D Balance Ball example uses a **Space Size** of 8.
139
+
This means that the feature
140
+
vector containing the Agent's observations contains eight elements: the `x` and
141
+
`z` components of the agent cube's rotation and the `x`, `y`, and `z` components
142
+
of the ball's relative position and velocity. (The observation values are
143
+
defined in the Agent's `CollectObservations()` function.)
144
+
145
+
#### Behavior Parameters : Vector Action Space
146
+
147
+
An Agent is given instructions in the form of a float array of *actions*.
148
+
ML-Agents toolkit classifies actions into two types: the **Continuous** vector
149
+
action space is a vector of numbers that can vary continuously. What each
150
+
element of the vector means is defined by the Agent logic (the training
151
+
process just learns what values are better given particular state observations
152
+
based on the rewards received when it tries different values). For example, an
153
+
element might represent a force or torque applied to a `Rigidbody` in the Agent.
154
+
The **Discrete** action vector space defines its actions as tables. An action
155
+
given to the Agent is an array of indices into tables.
156
+
157
+
The 3D Balance Ball example is programmed to use continuous action
158
+
space with `Space Size` of 2.
179
159
180
-
## Training the Brain with Reinforcement Learning
160
+
## Training with Reinforcement Learning
181
161
182
162
Now that we have an environment, we can perform the training.
183
163
@@ -272,11 +252,11 @@ From TensorBoard, you will see the summary statistics:
0 commit comments