Skip to content

Develop one to one documentation #2742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 21, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/Background-TensorFlow.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ performing computations using data flow graphs, the underlying representation of
deep learning models. It facilitates training and inference on CPUs and GPUs in
a desktop, server, or mobile device. Within the ML-Agents toolkit, when you
train the behavior of an agent, the output is a TensorFlow model (.nn) file
that you can then embed within a Learning Brain. Unless you implement a new
that you can then associate with an Agent. Unless you implement a new
algorithm, the use of TensorFlow is mostly abstracted away and behind the
scenes.

Expand Down
53 changes: 22 additions & 31 deletions docs/Basic-Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,26 +35,20 @@ inside Unity. In this section, we will use the pre-trained model for the
1. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Scenes` folder
and open the `3DBall` scene file.
2. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Prefabs` folder.
Expand `Game` and click on the `Platform` prefab. You should see the `Platform` prefab in the **Inspector** window.
Expand `3DBall` and click on the `Agent` prefab. You should see the `Agent` prefab in the **Inspector** window.

**Note**: The platforms in the `3DBall` scene were created using the `Platform` prefab. Instead of updating all 12 platforms individually, you can update the `Platform` prefab instead.
**Note**: The platforms in the `3DBall` scene were created using the `3DBall` prefab. Instead of updating all 12 platforms individually, you can update the `3DBall` prefab instead.

![Platform Prefab](images/platform_prefab.png)

3. In the **Project** window, drag the **3DBallLearning** Brain located in
`Assets/ML-Agents/Examples/3DBall/Brains` into the `Brain` property under `Ball 3D Agent (Script)` component in the **Inspector** window.
3. In the **Project** window, drag the **3DBallLearning** Model located in
`Assets/ML-Agents/Examples/3DBall/TFModels` into the `Model` property under `Ball 3D Agent (Script)` component in the **Inspector** window.

![3dball learning brain](images/3dball_learning_brain.png)

4. You should notice that each `Platform` under each `Game` in the **Hierarchy** windows now contains **3DBallLearning** as `Brain`. __Note__ : You can modify multiple game objects in a scene by selecting them all at
4. You should notice that each `Agent` under each `3DBall` in the **Hierarchy** windows now contains **3DBallLearning** as `Model`. __Note__ : You can modify multiple game objects in a scene by selecting them all at
once using the search bar in the Scene Hierarchy.
5. In the **Project** window, click on the **3DBallLearning** Brain located in
`Assets/ML-Agents/Examples/3DBall/Brains`. You should see the properties in the **Inspector** window.
6. In the **Project** window, open the `Assets/ML-Agents/Examples/3DBall/TFModels`
folder.
7. Drag the `3DBallLearning` model file from the `Assets/ML-Agents/Examples/3DBall/TFModels`
folder to the **Model** field of the **3DBallLearning** Brain in the **Inspector** window. __Note__ : All of the brains should now have `3DBallLearning` as the TensorFlow model in the `Model` property
8. Select the **InferenceDevice** to use for this model (CPU or GPU).
8. Select the **InferenceDevice** to use for this model (CPU or GPU) on the Agent.
_Note: CPU is faster for the majority of ML-Agents toolkit generated models_
9. Click the **Play** button and you will see the platforms balance the balls
using the pre-trained model.
Expand All @@ -73,22 +67,19 @@ if you want to [use an executable](Learning-Environment-Executable.md) or to
More information and documentation is provided in the
[Python API](Python-API.md) page.

## Training the Brain with Reinforcement Learning
## Training the Model with Reinforcement Learning

### Setting up the environment for training

To set up the environment for training, you will need to specify which agents are contributing
to the training and which Brain is being trained. You can only perform training with
a `Learning Brain`.

Each platform agent needs an assigned `Learning Brain`. In this example, each platform agent was created using a prefab. To update all of the brains in each platform agent at once, you only need to update the platform agent prefab. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Prefabs` folder. Expand `Game` and click on the `Platform` prefab. You should see the `Platform` prefab in the **Inspector** window. In the **Project** window, drag the **3DBallLearning** Brain located in `Assets/ML-Agents/Examples/3DBall/Brains` into the `Brain` property under `Ball 3D Agent (Script)` component in the **Inspector** window.

**Note**: The Unity prefab system will modify all instances of the agent properties in your scene. If the agent does not synchronize automatically with the prefab, you can hit the Revert button in the top of the **Inspector** window.

**Note:** Assigning a Brain to an agent (dragging a Brain into the `Brain` property of

the agent) means that the Brain will be making decision for that agent. If the Agent uses a
LearningBrain either Python controls the Brain or the model on the Brain does.
In order to setup the Agents for Training, you will need to edit the
`Behavior Name` under `BehaviorParamters` in the Agent Inspector window.
The `Behavior Name` is used to group agents per behaviors. Note that Agents
sharing the same `Behavior Name` must be agents of the same type using the
same `Behavior Parameters`. You can make sure all your agents have the same
`Behavior Parameters` using Prefabs.
The `Behavior Name` corresponds to the name of the model that will be
generated by the training process and is used to select the hyperparameters
from the training configuration file.

### Training the environment

Expand Down Expand Up @@ -216,22 +207,22 @@ INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 10000. Mean Reward: 2
### After training

You can press Ctrl+C to stop the training, and your trained model will be at
`models/<run-identifier>/<brain_name>.nn` where
`<brain_name>` is the name of the Brain corresponding to the model.
`models/<run-identifier>/<behavior_name>.nn` where
`<behavior_name>` is the name of the `Behavior Name` of the agents corresponding to the model.
(**Note:** There is a known bug on Windows that causes the saving of the model to
fail when you early terminate the training, it's recommended to wait until Step
has reached the max_steps parameter you set in trainer_config.yaml.) This file
corresponds to your model's latest checkpoint. You can now embed this trained
model into your Learning Brain by following the steps below, which is similar to
model into your Agents by following the steps below, which is similar to
the steps described
[above](#running-a-pre-trained-model).

1. Move your model file into
`UnitySDK/Assets/ML-Agents/Examples/3DBall/TFModels/`.
2. Open the Unity Editor, and select the **3DBall** scene as described above.
3. Select the **3DBallLearning** Learning Brain from the Scene hierarchy.
4. Drag the `<brain_name>.nn` file from the Project window of
the Editor to the **Model** placeholder in the **3DBallLearning**
3. Select the **3DBall** prefab Agent object.
4. Drag the `<behavior_name>.nn` file from the Project window of
the Editor to the **Model** placeholder in the **Ball3DAgent**
inspector window.
5. Press the :arrow_forward: button at the top of the Editor.

Expand Down
4 changes: 2 additions & 2 deletions docs/Creating-Custom-Protobuf-Messages.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ In Python, the custom field would be accessed like:
```python
...
result = env.step(...)
result[brain_name].custom_observations[0].customField
result[behavior_name].custom_observations[0].customField
```

where `brain_name` is the name of the brain attached to the agent.
where `behavior_name` is the `Behavior Name` property of the Agent.
2 changes: 1 addition & 1 deletion docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ UnityAgentsException: The Communicator was unable to connect. Please make sure t

There may be a number of possible causes:

* _Cause_: There may be no agent in the scene with a LearningBrain
* _Cause_: There may be no agent in the scene
* _Cause_: On OSX, the firewall may be preventing communication with the
environment. _Solution_: Add the built environment binary to the list of
exceptions on the firewall by following
Expand Down
4 changes: 2 additions & 2 deletions docs/Feature-Memory.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ It is now possible to give memories to your agents. When training, the agents
will be able to store a vector of floats to be used next time they need to make
a decision.

![Brain Inspector](images/ml-agents-LSTM.png)
![Inspector](images/ml-agents-LSTM.png)

Deciding what the agents should remember in order to solve a task is not easy to
do by hand, but our training algorithms can learn to keep track of what is
Expand All @@ -19,7 +19,7 @@ important to remember with
## How to use

When configuring the trainer parameters in the `config/trainer_config.yaml`
file, add the following parameters to the Brain you want to use.
file, add the following parameters to the Behavior you want to use.

```json
use_recurrent: true
Expand Down
116 changes: 48 additions & 68 deletions docs/Getting-Started-with-Balance-Ball.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ and Unity, see the [installation instructions](Installation.md).

An agent is an autonomous actor that observes and interacts with an
_environment_. In the context of Unity, an environment is a scene containing an
Academy and one or more Brain and Agent objects, and, of course, the other
Academy and one or more Agent objects, and, of course, the other
entities that an agent interacts with.

![Unity Editor](images/mlagents-3DBallHierarchy.png)
Expand All @@ -45,7 +45,7 @@ window. The Inspector shows every component on a GameObject.

The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several agent cubes. Each agent cube in the scene is an
independent agent, but they all share the same Brain. 3D Balance Ball does this
independent agent, but they all share the same Behavior. 3D Balance Ball does this
to speed up training since all twelve agents contribute to training in parallel.

### Academy
Expand Down Expand Up @@ -82,68 +82,16 @@ The 3D Balance Ball environment does not use these functions — each Agent rese
itself when needed — but many environments do use these functions to control the
environment around the Agents.

### Brain

As of v0.6, a Brain is a Unity asset and exists within the `UnitySDK` folder. These brains (ex. **3DBallLearning.asset**) are loaded into each Agent object (ex. **Ball3DAgents**). A Brain doesn't store any information about an Agent, it just
routes the Agent's collected observations to the decision making process and
returns the chosen action to the Agent. All Agents can share the same
Brain, but would act independently. The Brain settings tell you quite a bit about how
an Agent works.

You can create new Brain assets by selecting `Assets ->
Create -> ML-Agents -> Brain`. There are 3 types of Brains.
The **Learning Brain** is a Brain that uses a trained neural network to make decisions.
When Unity is connected to Python, the external process will be controlling the Brain.
The external process that is training the neural network will take over decision making for the agents
and ultimately generate a trained neural network. You can also use the
**Learning Brain** with a pre-trained model.
The **Heuristic** Brain allows you to hand-code the Agent logic by extending
the Decision class.
Finally, the **Player** Brain lets you map keyboard commands to actions, which
can be useful when testing your agents and environment. You can also implement your own type of Brain.

In this tutorial, you will use the **Learning Brain** for training.

#### Vector Observation Space

Before making a decision, an agent collects its observation about its state in
the world. The vector observation is a vector of floating point numbers which
contain relevant information for the agent to make decisions.

The Brain instance used in the 3D Balance Ball example uses the **Continuous**
vector observation space with a **State Size** of 8. This means that the feature
vector containing the Agent's observations contains eight elements: the `x` and
`z` components of the agent cube's rotation and the `x`, `y`, and `z` components
of the ball's relative position and velocity. (The observation values are
defined in the Agent's `CollectObservations()` function.)

#### Vector Action Space

An Agent is given instructions from the Brain in the form of *actions*.
ML-Agents toolkit classifies actions into two types: the **Continuous** vector
action space is a vector of numbers that can vary continuously. What each
element of the vector means is defined by the Agent logic (the PPO training
process just learns what values are better given particular state observations
based on the rewards received when it tries different values). For example, an
element might represent a force or torque applied to a `Rigidbody` in the Agent.
The **Discrete** action vector space defines its actions as tables. An action
given to the Agent is an array of indices into tables.

The 3D Balance Ball example is programmed to use both types of vector action
space. You can try training with both settings to observe whether there is a
difference. (Set the `Vector Action Space Size` to 4 when using the discrete
action space and 2 when using continuous.)

### Agent

The Agent is the actor that observes and takes actions in the environment. In
the 3D Balance Ball environment, the Agent components are placed on the twelve
"Agent" GameObjects. The base Agent object has a few properties that affect its
behavior:

* **Brain** — Every Agent must have a Brain. The Brain determines how an Agent
makes decisions. All the Agents in the 3D Balance Ball scene share the same
Brain.
* **Behavior Parameters** — Every Agent must have a Behavior. The Behavior
determines how an Agent makes decisions. More on Behavior Parameters in
the next section.
* **Visual Observations** — Defines any Camera objects used by the Agent to
observe its environment. 3D Balance Ball does not use camera observations.
* **Max Step** — Defines how many simulation steps can occur before the Agent
Expand All @@ -162,22 +110,54 @@ The Ball3DAgent subclass defines the following methods:
training generalizes to more than a specific starting position and agent cube
attitude.
* agent.CollectObservations() — Called every simulation step. Responsible for
collecting the Agent's observations of the environment. Since the Brain
instance assigned to the Agent is set to the continuous vector observation
collecting the Agent's observations of the environment. Since the Behavior
Parameters of the Agent are set with vector observation
space with a state size of 8, the `CollectObservations()` must call
`AddVectorObs` such that vector size adds up to 8.
`AddVectorObs` such that vector size adds up to 8.
* agent.AgentAction() — Called every simulation step. Receives the action chosen
by the Brain. The Ball3DAgent example handles both the continuous and the
discrete action space types. There isn't actually much difference between the
two state types in this environment — both vector action spaces result in a
by the Agent. The vector action spaces result in a
small change in the agent cube's rotation at each step. The `AgentAction()` function
assigns a reward to the Agent; in this example, an Agent receives a small
positive reward for each step it keeps the ball on the agent cube's head and a larger,
negative reward for dropping the ball. An Agent is also marked as done when it
drops the ball so that it will reset with a new ball for the next simulation
step.
* agent.Heuristic() - When the `Use Heuristic` checkbox is checked in the Behavior
Parameters of the Agent, the Agent will use the `Heuristic()` method to generate
the actions of the Agent. As such, the `Heuristic()` method returns an array of
floats. In the case of the Ball 3D Agent, the `Heuristic()` method converts the
keyboard inputs into actions.


#### Behavior Parameters : Vector Observation Space

Before making a decision, an agent collects its observation about its state in
the world. The vector observation is a vector of floating point numbers which
contain relevant information for the agent to make decisions.

The Behavior Parameters of the 3D Balance Ball example uses a **Space Size** of 8.
This means that the feature
vector containing the Agent's observations contains eight elements: the `x` and
`z` components of the agent cube's rotation and the `x`, `y`, and `z` components
of the ball's relative position and velocity. (The observation values are
defined in the Agent's `CollectObservations()` function.)

#### Behavior Parameters : Vector Action Space

An Agent is given instructions in the form of a float array of *actions*.
ML-Agents toolkit classifies actions into two types: the **Continuous** vector
action space is a vector of numbers that can vary continuously. What each
element of the vector means is defined by the Agent logic (the training
process just learns what values are better given particular state observations
based on the rewards received when it tries different values). For example, an
element might represent a force or torque applied to a `Rigidbody` in the Agent.
The **Discrete** action vector space defines its actions as tables. An action
given to the Agent is an array of indices into tables.

The 3D Balance Ball example is programmed to use continuous action
space with `Space Size` of 2.

## Training the Brain with Reinforcement Learning
## Training with Reinforcement Learning

Now that we have an environment, we can perform the training.

Expand Down Expand Up @@ -272,11 +252,11 @@ From TensorBoard, you will see the summary statistics:

![Example TensorBoard Run](images/mlagents-TensorBoard.png)

## Embedding the Trained Brain into the Unity Environment (Experimental)
## Embedding the Model into the Unity Environment

Once the training process completes, and the training process saves the model
(denoted by the `Saved Model` message) you can add it to the Unity project and
use it with Agents having a **Learning Brain**.
use it with compatible Agents (the Agents that generated the model).
__Note:__ Do not just close the Unity Window once the `Saved Model` message appears.
Either wait for the training process to close the window or press Ctrl+C at the
command-line prompt. If you close the window manually, the `.nn` file
Expand All @@ -285,6 +265,6 @@ containing the trained model is not exported into the ml-agents folder.
### Embedding the trained model into Unity

To embed the trained model into Unity, follow the later part of [Training the
Brain with Reinforcement
Learning](Basic-Guide.md#training-the-brain-with-reinforcement-learning) section
Model with Reinforcement
Learning](Basic-Guide.md#training-the-model-with-reinforcement-learning) section
of the Basic Guide page.
10 changes: 5 additions & 5 deletions docs/Glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@
environment.
* **Agent** - Unity Component which produces observations and takes actions in
the environment. Agents actions are determined by decisions produced by a
linked Brain.
* **Brain** - Unity Asset which makes decisions for the agents linked to it.
* **Decision** - The specification produced by a Brain for an action to be
Policy.
* **Policy** - The decision making mechanism, typically a neural network model.
* **Decision** - The specification produced by a Policy for an action to be
carried out given an observation.
* **Editor** - The Unity Editor, which may include any pane (e.g. Hierarchy,
Scene, Inspector).
* **Environment** - The Unity scene which contains Agents, Academy, and Brains.
* **Environment** - The Unity scene which contains Agents and the Academy.
* **FixedUpdate** - Unity method called each time the game engine is
stepped. ML-Agents logic should be placed here.
* **Frame** - An instance of rendering the main camera for the display.
Expand All @@ -31,4 +31,4 @@
* **External Coordinator** - ML-Agents class responsible for communication with
outside processes (in this case, the Python API).
* **Trainer** - Python class which is responsible for training a given
Brain. Contains TensorFlow graph which makes decisions for Learning Brain.
group of Agents.
Loading