Skip to content

Academy singleton docs #3218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 14, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ UnityEnvironment(file_name=filename, worker_id=X)

If you receive a message `Mean reward : nan` when attempting to train a model
using PPO, this is due to the episodes of the Learning Environment not
terminating. In order to address this, set `Max Steps` for either the Academy or
terminating. In order to address this, set `Max Steps` for the
Agents within the Scene Inspector to a value greater than 0. Alternatively, it
is possible to manually set `done` conditions for episodes from within scripts
for custom episode-terminating events.
Expand Down
3 changes: 0 additions & 3 deletions docs/Getting-Started-with-Balance-Ball.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,6 @@ it contains not one, but several agent cubes. Each agent cube in the scene is a
independent agent, but they all share the same Behavior. 3D Balance Ball does this
to speed up training since all twelve agents contribute to training in parallel.

### Academy

The Academy object for the scene is placed on the Ball3DAcademy GameObject.

### Agent

Expand Down
4 changes: 2 additions & 2 deletions docs/Glossary.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# ML-Agents Toolkit Glossary

* **Academy** - Unity Component which controls timing, reset, and
* **Academy** - Singleton object which controls timing, reset, and
training/inference settings of the environment.
* **Action** - The carrying-out of a decision on the part of an agent within the
environment.
Expand All @@ -12,7 +12,7 @@
carried out given an observation.
* **Editor** - The Unity Editor, which may include any pane (e.g. Hierarchy,
Scene, Inspector).
* **Environment** - The Unity scene which contains Agents and the Academy.
* **Environment** - The Unity scene which contains Agents.
* **FixedUpdate** - Unity method called each time the game engine is
stepped. ML-Agents logic should be placed here.
* **Frame** - An instance of rendering the main camera for the display.
Expand Down
28 changes: 2 additions & 26 deletions docs/Learning-Environment-Create-New.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,11 @@ steps:
1. Create an environment for your agents to live in. An environment can range
from a simple physical simulation containing a few objects to an entire game
or ecosystem.
2. Add an Academy MonoBehaviour to a GameObject in the Unity scene
containing the environment.
3. Implement your Agent subclasses. An Agent subclass defines the code an Agent
2. Implement your Agent subclasses. An Agent subclass defines the code an Agent
uses to observe its environment, to carry out assigned actions, and to
calculate the rewards used for reinforcement training. You can also implement
optional methods to reset the Agent when it has finished or failed its task.
4. Add your Agent subclasses to appropriate GameObjects, typically, the object
3. Add your Agent subclasses to appropriate GameObjects, typically, the object
in the scene that represents the Agent in the simulation.

**Note:** If you are unfamiliar with Unity, refer to
Expand Down Expand Up @@ -103,27 +101,6 @@ different material from the list of all materials currently in the project.)
Note that we will create an Agent subclass to add to this GameObject as a
component later in the tutorial.

### Add an Empty GameObject to Hold the Academy

1. Right click in Hierarchy window, select Create Empty.
2. Name the GameObject "Academy"

![The scene hierarchy](images/mlagents-NewTutHierarchy.png)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: removed this image


You can adjust the camera angles to give a better view of the scene at runtime.
The next steps will be to create and add the ML-Agent components.

## Add an Academy
The Academy object coordinates the ML-Agents in the scene and drives the
decision-making portion of the simulation loop. Every ML-Agent scene needs one
(and only one) Academy instance.

First, add an Academy component to the Academy GameObject created earlier:

1. Select the Academy GameObject to view it in the Inspector window.
2. Click **Add Component**.
3. Select **Academy** in the list of components.

## Implement an Agent

To create the Agent:
Expand Down Expand Up @@ -524,7 +501,6 @@ to use Unity ML-Agents: an Academy and one or more Agents.

Keep in mind:

* There can only be one Academy game object in a scene.
* If you are using multiple training areas, make sure all the Agents have the same `Behavior Name`
and `Behavior Parameters`

Expand Down
15 changes: 5 additions & 10 deletions docs/Learning-Environment-Design.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ The ML-Agents Academy class orchestrates the agent simulation loop as follows:
an Agent to restart if it finishes before the end of an episode. In this
case, the Academy calls the `AgentReset()` function.

To create a training environment, extend the Academy and Agent classes to
To create a training environment, extend the Agent class to
implement the above methods. The `Agent.CollectObservations()` and
`Agent.AgentAction()` functions are required; the other methods are optional —
whether you need to implement them or not depends on your specific scenario.
Expand All @@ -64,14 +64,13 @@ information.

## Organizing the Unity Scene

To train and use the ML-Agents toolkit in a Unity scene, the scene must contain
a single Academy and as many Agent subclasses as you need.
To train and use the ML-Agents toolkit in a Unity scene, the scene as many Agent subclasses as you need.
Agent instances should be attached to the GameObject representing that Agent.

### Academy

The Academy object orchestrates Agents and their decision making processes. Only
place a single Academy object in a scene.
The Academy is a singleton which orchestrates Agents and their decision making processes. Only
a single Academy exists at a time.

#### Academy resetting
To alter the environment at the start of each episode, add your method to the Academy's OnEnvironmentReset action.
Expand All @@ -81,9 +80,7 @@ public class MySceneBehavior : MonoBehaviour
{
public void Awake()
{
var academy = FindObjectOfType<Academy>();
academy.LazyInitialization();
academy.OnEnvironmentReset += EnvironmentReset;
Academy.Instance.OnEnvironmentReset += EnvironmentReset;
}

void EnvironmentReset()
Expand Down Expand Up @@ -144,8 +141,6 @@ training and for testing trained agents. Or, you may be training agents to
operate in a complex game or simulation. In this case, it might be more
efficient and practical to create a purpose-built training scene.

Both training and testing (or normal game) scenes must contain an Academy object
to control the agent decision making process.
When you create a training environment in Unity, you must set up the scene so
that it can be controlled by the external training process. Considerations
include:
Expand Down
7 changes: 4 additions & 3 deletions docs/Limitations.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,13 @@ making. See
[Execution Order of Event Functions](https://docs.unity3d.com/Manual/ExecutionOrder.html)
for more information.

You can control the frequency of Academy stepping by calling
`Academy.Instance.DisableAutomaticStepping()`, and then calling
`Academy.Instance.EnvironmentStep()`

## Python API

### Python version

As of version 0.3, we no longer support Python 2.

### TensorFlow support

Currently the Ml-Agents toolkit uses TensorFlow 1.7.1 only.
6 changes: 2 additions & 4 deletions docs/ML-Agents-Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,17 +131,15 @@ components:

_Simplified block diagram of ML-Agents._

The Learning Environment contains two additional components that help
The Learning Environment contains an additional component that help
organize the Unity scene:

- **Agents** - which is attached to a Unity GameObject (any character within a
scene) and handles generating its observations, performing the actions it
receives and assigning a reward (positive / negative) when appropriate. Each
Agent is linked to a Policy.
- **Academy** - which orchestrates the observation and decision making process.
The External Communicator lives within the Academy.

Every Learning Environment will always have one global Academy and one Agent for
Every Learning Environment will always have one Agent for
every character in the scene. While each Agent must be linked to a Policy, it is
possible for Agents that have similar observations and actions to have
the same Policy type. In our sample game, we have two teams each with their own medic.
Expand Down
4 changes: 2 additions & 2 deletions docs/Migrating.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ The versions can be found in
## Migrating from 0.13 to latest

### Important changes
* The Academy class was changed to be sealed and its virtual methods were removed.
* The Academy class was changed to a singleton, and its virtual methods were removed.
* Trainer steps are now counted per-Agent, not per-environment as in previous versions. For instance, if you have 10 Agents in the scene, 20 environment steps now corresponds to 200 steps as printed in the terminal and in Tensorboard.
* Curriculum config files are now YAML formatted and all curricula for a training run are combined into a single file.
* The `--num-runs` command-line option has been removed.

### Steps to Migrate
* If you have a class that inherits from Academy:
* If the class didn't override any of the virtual methods and didn't store any additional data, you can just replace the instance of it in the scene with an Academy.
* If the class didn't override any of the virtual methods and didn't store any additional data, you can just remove the old script from the scene.
* If the class had additional data, create a new MonoBehaviour and store the data on this instead.
* If the class overrode the virtual methods, create a new MonoBehaviour and move the logic to it:
* Move the InitializeAcademy code to MonoBehaviour.OnAwake
Expand Down
3 changes: 1 addition & 2 deletions docs/Python-API.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,6 @@ i = env.reset()
Once a property has been modified in Python, you can access it in C# after the next call to `step` as follows:

```csharp
var academy = FindObjectOfType<Academy>();
var sharedProperties = academy.FloatProperties;
var sharedProperties = Academy.Instance.FloatProperties;
float property1 = sharedProperties.GetPropertyWithDefault("parameter_1", 0.0f);
```
4 changes: 2 additions & 2 deletions docs/Training-Curriculum-Learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ the same environment.
In order to define the curricula, the first step is to decide which parameters of
the environment will vary. In the case of the Wall Jump environment,
the height of the wall is what varies. We define this as a `Shared Float Property`
that can be accessed in `Academy.FloatProperties`, and by doing so it becomes
adjustable via the Python API.
that can be accessed in `Academy.Instance.FloatProperties`, and by doing
so it becomes adjustable via the Python API.
Rather than adjusting it by hand, we will create a YAML file which
describes the structure of the curricula. Within it, we can specify which
points in the training process our wall height will change, either based on the
Expand Down
2 changes: 1 addition & 1 deletion docs/Training-Generalized-Reinforcement-Learning-Agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Ball scale of 0.5 | Ball scale of 4
## Introducing Generalization Using Reset Parameters

To enable variations in the environments, we implemented `Reset Parameters`.
`Reset Parameters` are `Academy.FloatProperties` that are used only when
`Reset Parameters` are `Academy.Instance.FloatProperties` that are used only when
resetting the environment. We
also included different sampling methods and the ability to create new kinds of
sampling methods for each `Reset Parameter`. In the 3D ball environment example displayed
Expand Down
2 changes: 1 addition & 1 deletion docs/Training-ML-Agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

The ML-Agents toolkit conducts training using an external Python training
process. During training, this external process communicates with the Academy
object in the Unity scene to generate a block of agent experiences. These
to generate a block of agent experiences. These
experiences become the training set for a neural network used to optimize the
agent's policy (which is essentially a mathematical function mapping
observations to actions). In reinforcement learning, the neural network
Expand Down
Binary file modified docs/images/mlagents-3DBallHierarchy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/images/mlagents-NewTutHierarchy.png
Binary file not shown.
Binary file modified docs/images/mlagents-Open3DBall.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.