Skip to content

Commit 03d6712

Browse files
Develop one to one documentation (#2742)
* initial changes to the documentation * More documentation changes, not done. * More documentation changes * More docs * Changed the images * addressing comments * Adding one line to the migrating doc
1 parent 87c535d commit 03d6712

39 files changed

+296
-645
lines changed

docs/Background-TensorFlow.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ performing computations using data flow graphs, the underlying representation of
1717
deep learning models. It facilitates training and inference on CPUs and GPUs in
1818
a desktop, server, or mobile device. Within the ML-Agents toolkit, when you
1919
train the behavior of an agent, the output is a TensorFlow model (.nn) file
20-
that you can then embed within a Learning Brain. Unless you implement a new
20+
that you can then associate with an Agent. Unless you implement a new
2121
algorithm, the use of TensorFlow is mostly abstracted away and behind the
2222
scenes.
2323

docs/Basic-Guide.md

Lines changed: 22 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -35,26 +35,20 @@ inside Unity. In this section, we will use the pre-trained model for the
3535
1. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Scenes` folder
3636
and open the `3DBall` scene file.
3737
2. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Prefabs` folder.
38-
Expand `Game` and click on the `Platform` prefab. You should see the `Platform` prefab in the **Inspector** window.
38+
Expand `3DBall` and click on the `Agent` prefab. You should see the `Agent` prefab in the **Inspector** window.
3939

40-
**Note**: The platforms in the `3DBall` scene were created using the `Platform` prefab. Instead of updating all 12 platforms individually, you can update the `Platform` prefab instead.
40+
**Note**: The platforms in the `3DBall` scene were created using the `3DBall` prefab. Instead of updating all 12 platforms individually, you can update the `3DBall` prefab instead.
4141

4242
![Platform Prefab](images/platform_prefab.png)
4343

44-
3. In the **Project** window, drag the **3DBallLearning** Brain located in
45-
`Assets/ML-Agents/Examples/3DBall/Brains` into the `Brain` property under `Ball 3D Agent (Script)` component in the **Inspector** window.
44+
3. In the **Project** window, drag the **3DBallLearning** Model located in
45+
`Assets/ML-Agents/Examples/3DBall/TFModels` into the `Model` property under `Ball 3D Agent (Script)` component in the **Inspector** window.
4646

4747
![3dball learning brain](images/3dball_learning_brain.png)
4848

49-
4. You should notice that each `Platform` under each `Game` in the **Hierarchy** windows now contains **3DBallLearning** as `Brain`. __Note__ : You can modify multiple game objects in a scene by selecting them all at
49+
4. You should notice that each `Agent` under each `3DBall` in the **Hierarchy** windows now contains **3DBallLearning** as `Model`. __Note__ : You can modify multiple game objects in a scene by selecting them all at
5050
once using the search bar in the Scene Hierarchy.
51-
5. In the **Project** window, click on the **3DBallLearning** Brain located in
52-
`Assets/ML-Agents/Examples/3DBall/Brains`. You should see the properties in the **Inspector** window.
53-
6. In the **Project** window, open the `Assets/ML-Agents/Examples/3DBall/TFModels`
54-
folder.
55-
7. Drag the `3DBallLearning` model file from the `Assets/ML-Agents/Examples/3DBall/TFModels`
56-
folder to the **Model** field of the **3DBallLearning** Brain in the **Inspector** window. __Note__ : All of the brains should now have `3DBallLearning` as the TensorFlow model in the `Model` property
57-
8. Select the **InferenceDevice** to use for this model (CPU or GPU).
51+
8. Select the **InferenceDevice** to use for this model (CPU or GPU) on the Agent.
5852
_Note: CPU is faster for the majority of ML-Agents toolkit generated models_
5953
9. Click the **Play** button and you will see the platforms balance the balls
6054
using the pre-trained model.
@@ -73,22 +67,19 @@ if you want to [use an executable](Learning-Environment-Executable.md) or to
7367
More information and documentation is provided in the
7468
[Python API](Python-API.md) page.
7569

76-
## Training the Brain with Reinforcement Learning
70+
## Training the Model with Reinforcement Learning
7771

7872
### Setting up the environment for training
7973

80-
To set up the environment for training, you will need to specify which agents are contributing
81-
to the training and which Brain is being trained. You can only perform training with
82-
a `Learning Brain`.
83-
84-
Each platform agent needs an assigned `Learning Brain`. In this example, each platform agent was created using a prefab. To update all of the brains in each platform agent at once, you only need to update the platform agent prefab. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Prefabs` folder. Expand `Game` and click on the `Platform` prefab. You should see the `Platform` prefab in the **Inspector** window. In the **Project** window, drag the **3DBallLearning** Brain located in `Assets/ML-Agents/Examples/3DBall/Brains` into the `Brain` property under `Ball 3D Agent (Script)` component in the **Inspector** window.
85-
86-
**Note**: The Unity prefab system will modify all instances of the agent properties in your scene. If the agent does not synchronize automatically with the prefab, you can hit the Revert button in the top of the **Inspector** window.
87-
88-
**Note:** Assigning a Brain to an agent (dragging a Brain into the `Brain` property of
89-
90-
the agent) means that the Brain will be making decision for that agent. If the Agent uses a
91-
LearningBrain either Python controls the Brain or the model on the Brain does.
74+
In order to setup the Agents for Training, you will need to edit the
75+
`Behavior Name` under `BehaviorParamters` in the Agent Inspector window.
76+
The `Behavior Name` is used to group agents per behaviors. Note that Agents
77+
sharing the same `Behavior Name` must be agents of the same type using the
78+
same `Behavior Parameters`. You can make sure all your agents have the same
79+
`Behavior Parameters` using Prefabs.
80+
The `Behavior Name` corresponds to the name of the model that will be
81+
generated by the training process and is used to select the hyperparameters
82+
from the training configuration file.
9283

9384
### Training the environment
9485

@@ -216,22 +207,22 @@ INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 10000. Mean Reward: 2
216207
### After training
217208

218209
You can press Ctrl+C to stop the training, and your trained model will be at
219-
`models/<run-identifier>/<brain_name>.nn` where
220-
`<brain_name>` is the name of the Brain corresponding to the model.
210+
`models/<run-identifier>/<behavior_name>.nn` where
211+
`<behavior_name>` is the name of the `Behavior Name` of the agents corresponding to the model.
221212
(**Note:** There is a known bug on Windows that causes the saving of the model to
222213
fail when you early terminate the training, it's recommended to wait until Step
223214
has reached the max_steps parameter you set in trainer_config.yaml.) This file
224215
corresponds to your model's latest checkpoint. You can now embed this trained
225-
model into your Learning Brain by following the steps below, which is similar to
216+
model into your Agents by following the steps below, which is similar to
226217
the steps described
227218
[above](#running-a-pre-trained-model).
228219

229220
1. Move your model file into
230221
`UnitySDK/Assets/ML-Agents/Examples/3DBall/TFModels/`.
231222
2. Open the Unity Editor, and select the **3DBall** scene as described above.
232-
3. Select the **3DBallLearning** Learning Brain from the Scene hierarchy.
233-
4. Drag the `<brain_name>.nn` file from the Project window of
234-
the Editor to the **Model** placeholder in the **3DBallLearning**
223+
3. Select the **3DBall** prefab Agent object.
224+
4. Drag the `<behavior_name>.nn` file from the Project window of
225+
the Editor to the **Model** placeholder in the **Ball3DAgent**
235226
inspector window.
236227
5. Press the :arrow_forward: button at the top of the Editor.
237228

docs/Creating-Custom-Protobuf-Messages.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ In Python, the custom field would be accessed like:
165165
```python
166166
...
167167
result = env.step(...)
168-
result[brain_name].custom_observations[0].customField
168+
result[behavior_name].custom_observations[0].customField
169169
```
170170

171-
where `brain_name` is the name of the brain attached to the agent.
171+
where `behavior_name` is the `Behavior Name` property of the Agent.

docs/FAQ.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ UnityAgentsException: The Communicator was unable to connect. Please make sure t
4444

4545
There may be a number of possible causes:
4646

47-
* _Cause_: There may be no agent in the scene with a LearningBrain
47+
* _Cause_: There may be no agent in the scene
4848
* _Cause_: On OSX, the firewall may be preventing communication with the
4949
environment. _Solution_: Add the built environment binary to the list of
5050
exceptions on the firewall by following

docs/Feature-Memory.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ It is now possible to give memories to your agents. When training, the agents
99
will be able to store a vector of floats to be used next time they need to make
1010
a decision.
1111

12-
![Brain Inspector](images/ml-agents-LSTM.png)
12+
![Inspector](images/ml-agents-LSTM.png)
1313

1414
Deciding what the agents should remember in order to solve a task is not easy to
1515
do by hand, but our training algorithms can learn to keep track of what is
@@ -19,7 +19,7 @@ important to remember with
1919
## How to use
2020

2121
When configuring the trainer parameters in the `config/trainer_config.yaml`
22-
file, add the following parameters to the Brain you want to use.
22+
file, add the following parameters to the Behavior you want to use.
2323

2424
```json
2525
use_recurrent: true

docs/Getting-Started-with-Balance-Ball.md

Lines changed: 48 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ and Unity, see the [installation instructions](Installation.md).
3232

3333
An agent is an autonomous actor that observes and interacts with an
3434
_environment_. In the context of Unity, an environment is a scene containing an
35-
Academy and one or more Brain and Agent objects, and, of course, the other
35+
Academy and one or more Agent objects, and, of course, the other
3636
entities that an agent interacts with.
3737

3838
![Unity Editor](images/mlagents-3DBallHierarchy.png)
@@ -45,7 +45,7 @@ window. The Inspector shows every component on a GameObject.
4545

4646
The first thing you may notice after opening the 3D Balance Ball scene is that
4747
it contains not one, but several agent cubes. Each agent cube in the scene is an
48-
independent agent, but they all share the same Brain. 3D Balance Ball does this
48+
independent agent, but they all share the same Behavior. 3D Balance Ball does this
4949
to speed up training since all twelve agents contribute to training in parallel.
5050

5151
### Academy
@@ -82,68 +82,16 @@ The 3D Balance Ball environment does not use these functions — each Agent rese
8282
itself when needed — but many environments do use these functions to control the
8383
environment around the Agents.
8484

85-
### Brain
86-
87-
As of v0.6, a Brain is a Unity asset and exists within the `UnitySDK` folder. These brains (ex. **3DBallLearning.asset**) are loaded into each Agent object (ex. **Ball3DAgents**). A Brain doesn't store any information about an Agent, it just
88-
routes the Agent's collected observations to the decision making process and
89-
returns the chosen action to the Agent. All Agents can share the same
90-
Brain, but would act independently. The Brain settings tell you quite a bit about how
91-
an Agent works.
92-
93-
You can create new Brain assets by selecting `Assets ->
94-
Create -> ML-Agents -> Brain`. There are 3 types of Brains.
95-
The **Learning Brain** is a Brain that uses a trained neural network to make decisions.
96-
When Unity is connected to Python, the external process will be controlling the Brain.
97-
The external process that is training the neural network will take over decision making for the agents
98-
and ultimately generate a trained neural network. You can also use the
99-
**Learning Brain** with a pre-trained model.
100-
The **Heuristic** Brain allows you to hand-code the Agent logic by extending
101-
the Decision class.
102-
Finally, the **Player** Brain lets you map keyboard commands to actions, which
103-
can be useful when testing your agents and environment. You can also implement your own type of Brain.
104-
105-
In this tutorial, you will use the **Learning Brain** for training.
106-
107-
#### Vector Observation Space
108-
109-
Before making a decision, an agent collects its observation about its state in
110-
the world. The vector observation is a vector of floating point numbers which
111-
contain relevant information for the agent to make decisions.
112-
113-
The Brain instance used in the 3D Balance Ball example uses the **Continuous**
114-
vector observation space with a **State Size** of 8. This means that the feature
115-
vector containing the Agent's observations contains eight elements: the `x` and
116-
`z` components of the agent cube's rotation and the `x`, `y`, and `z` components
117-
of the ball's relative position and velocity. (The observation values are
118-
defined in the Agent's `CollectObservations()` function.)
119-
120-
#### Vector Action Space
121-
122-
An Agent is given instructions from the Brain in the form of *actions*.
123-
ML-Agents toolkit classifies actions into two types: the **Continuous** vector
124-
action space is a vector of numbers that can vary continuously. What each
125-
element of the vector means is defined by the Agent logic (the PPO training
126-
process just learns what values are better given particular state observations
127-
based on the rewards received when it tries different values). For example, an
128-
element might represent a force or torque applied to a `Rigidbody` in the Agent.
129-
The **Discrete** action vector space defines its actions as tables. An action
130-
given to the Agent is an array of indices into tables.
131-
132-
The 3D Balance Ball example is programmed to use both types of vector action
133-
space. You can try training with both settings to observe whether there is a
134-
difference. (Set the `Vector Action Space Size` to 4 when using the discrete
135-
action space and 2 when using continuous.)
136-
13785
### Agent
13886

13987
The Agent is the actor that observes and takes actions in the environment. In
14088
the 3D Balance Ball environment, the Agent components are placed on the twelve
14189
"Agent" GameObjects. The base Agent object has a few properties that affect its
14290
behavior:
14391

144-
* **Brain** — Every Agent must have a Brain. The Brain determines how an Agent
145-
makes decisions. All the Agents in the 3D Balance Ball scene share the same
146-
Brain.
92+
* **Behavior Parameters** — Every Agent must have a Behavior. The Behavior
93+
determines how an Agent makes decisions. More on Behavior Parameters in
94+
the next section.
14795
* **Visual Observations** — Defines any Camera objects used by the Agent to
14896
observe its environment. 3D Balance Ball does not use camera observations.
14997
* **Max Step** — Defines how many simulation steps can occur before the Agent
@@ -162,22 +110,54 @@ The Ball3DAgent subclass defines the following methods:
162110
training generalizes to more than a specific starting position and agent cube
163111
attitude.
164112
* agent.CollectObservations() — Called every simulation step. Responsible for
165-
collecting the Agent's observations of the environment. Since the Brain
166-
instance assigned to the Agent is set to the continuous vector observation
113+
collecting the Agent's observations of the environment. Since the Behavior
114+
Parameters of the Agent are set with vector observation
167115
space with a state size of 8, the `CollectObservations()` must call
168-
`AddVectorObs` such that vector size adds up to 8.
116+
`AddVectorObs` such that vector size adds up to 8.
169117
* agent.AgentAction() — Called every simulation step. Receives the action chosen
170-
by the Brain. The Ball3DAgent example handles both the continuous and the
171-
discrete action space types. There isn't actually much difference between the
172-
two state types in this environment — both vector action spaces result in a
118+
by the Agent. The vector action spaces result in a
173119
small change in the agent cube's rotation at each step. The `AgentAction()` function
174120
assigns a reward to the Agent; in this example, an Agent receives a small
175121
positive reward for each step it keeps the ball on the agent cube's head and a larger,
176122
negative reward for dropping the ball. An Agent is also marked as done when it
177123
drops the ball so that it will reset with a new ball for the next simulation
178124
step.
125+
* agent.Heuristic() - When the `Use Heuristic` checkbox is checked in the Behavior
126+
Parameters of the Agent, the Agent will use the `Heuristic()` method to generate
127+
the actions of the Agent. As such, the `Heuristic()` method returns an array of
128+
floats. In the case of the Ball 3D Agent, the `Heuristic()` method converts the
129+
keyboard inputs into actions.
130+
131+
132+
#### Behavior Parameters : Vector Observation Space
133+
134+
Before making a decision, an agent collects its observation about its state in
135+
the world. The vector observation is a vector of floating point numbers which
136+
contain relevant information for the agent to make decisions.
137+
138+
The Behavior Parameters of the 3D Balance Ball example uses a **Space Size** of 8.
139+
This means that the feature
140+
vector containing the Agent's observations contains eight elements: the `x` and
141+
`z` components of the agent cube's rotation and the `x`, `y`, and `z` components
142+
of the ball's relative position and velocity. (The observation values are
143+
defined in the Agent's `CollectObservations()` function.)
144+
145+
#### Behavior Parameters : Vector Action Space
146+
147+
An Agent is given instructions in the form of a float array of *actions*.
148+
ML-Agents toolkit classifies actions into two types: the **Continuous** vector
149+
action space is a vector of numbers that can vary continuously. What each
150+
element of the vector means is defined by the Agent logic (the training
151+
process just learns what values are better given particular state observations
152+
based on the rewards received when it tries different values). For example, an
153+
element might represent a force or torque applied to a `Rigidbody` in the Agent.
154+
The **Discrete** action vector space defines its actions as tables. An action
155+
given to the Agent is an array of indices into tables.
156+
157+
The 3D Balance Ball example is programmed to use continuous action
158+
space with `Space Size` of 2.
179159

180-
## Training the Brain with Reinforcement Learning
160+
## Training with Reinforcement Learning
181161

182162
Now that we have an environment, we can perform the training.
183163

@@ -272,11 +252,11 @@ From TensorBoard, you will see the summary statistics:
272252

273253
![Example TensorBoard Run](images/mlagents-TensorBoard.png)
274254

275-
## Embedding the Trained Brain into the Unity Environment (Experimental)
255+
## Embedding the Model into the Unity Environment
276256

277257
Once the training process completes, and the training process saves the model
278258
(denoted by the `Saved Model` message) you can add it to the Unity project and
279-
use it with Agents having a **Learning Brain**.
259+
use it with compatible Agents (the Agents that generated the model).
280260
__Note:__ Do not just close the Unity Window once the `Saved Model` message appears.
281261
Either wait for the training process to close the window or press Ctrl+C at the
282262
command-line prompt. If you close the window manually, the `.nn` file
@@ -285,6 +265,6 @@ containing the trained model is not exported into the ml-agents folder.
285265
### Embedding the trained model into Unity
286266

287267
To embed the trained model into Unity, follow the later part of [Training the
288-
Brain with Reinforcement
289-
Learning](Basic-Guide.md#training-the-brain-with-reinforcement-learning) section
268+
Model with Reinforcement
269+
Learning](Basic-Guide.md#training-the-model-with-reinforcement-learning) section
290270
of the Basic Guide page.

docs/Glossary.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,13 @@
66
environment.
77
* **Agent** - Unity Component which produces observations and takes actions in
88
the environment. Agents actions are determined by decisions produced by a
9-
linked Brain.
10-
* **Brain** - Unity Asset which makes decisions for the agents linked to it.
11-
* **Decision** - The specification produced by a Brain for an action to be
9+
Policy.
10+
* **Policy** - The decision making mechanism, typically a neural network model.
11+
* **Decision** - The specification produced by a Policy for an action to be
1212
carried out given an observation.
1313
* **Editor** - The Unity Editor, which may include any pane (e.g. Hierarchy,
1414
Scene, Inspector).
15-
* **Environment** - The Unity scene which contains Agents, Academy, and Brains.
15+
* **Environment** - The Unity scene which contains Agents and the Academy.
1616
* **FixedUpdate** - Unity method called each time the game engine is
1717
stepped. ML-Agents logic should be placed here.
1818
* **Frame** - An instance of rendering the main camera for the display.
@@ -31,4 +31,4 @@
3131
* **External Coordinator** - ML-Agents class responsible for communication with
3232
outside processes (in this case, the Python API).
3333
* **Trainer** - Python class which is responsible for training a given
34-
Brain. Contains TensorFlow graph which makes decisions for Learning Brain.
34+
group of Agents.

0 commit comments

Comments
 (0)