Skip to content

Docs ll api #3047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 9, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/Migrating.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## Migrating from master to develop

### Important changes
* The low level Python API has changed. You can look at the document [Low Level Python API documentation](Python-API.md) for more information. This should only affect you if you're writing a custom trainer; if you use `mlagents-learn` for training, this should be a transparent change.
* `CustomResetParameters` are now removed.
* `reset()` on the Low-Level Python API no longer takes a `train_mode` argument. To modify the performance/speed of the engine, you must use an `EngineConfigurationChannel`
* `reset()` on the Low-Level Python API no longer takes a `config` argument. `UnityEnvironment` no longer has a `reset_parameters` field. To modify float properties in the environment, you must use a `FloatPropertiesChannel`. For more information, refer to the [Low Level Python API documentation](Python-API.md)
Expand Down
220 changes: 144 additions & 76 deletions docs/Python-API.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,17 @@
# Unity ML-Agents Python Interface and Trainers

The `mlagents` Python package is part of the [ML-Agents
Toolkit](https://github.com/Unity-Technologies/ml-agents). `mlagents` provides a
Python API that allows direct interaction with the Unity game engine as well as
a collection of trainers and algorithms to train agents in Unity environments.
# Unity ML-Agents Python Low Level API

The `mlagents` Python package contains two components: a low level API which
allows you to interact directly with a Unity Environment (`mlagents.envs`) and
an entry point to train (`mlagents-learn`) which allows you to train agents in
Unity Environments using our implementations of reinforcement learning or
imitation learning.

You can use the Python Low Level API to interact directly with your learning
environment, and use it to develop new learning algorithms
Copy link
Contributor

@ervteng ervteng Dec 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

period after algorithms, maybe add develop new learning algorithms for them. (not sure about this)


## mlagents.envs

The ML-Agents Toolkit provides a Python API for controlling the Agent simulation
The ML-Agents Toolkit Low Level API is a Python API for controlling the simulation
loop of an environment or game built with Unity. This API is used by the
training algorithms inside the ML-Agent Toolkit, but you can also write your own
Python programs using this API. Go [here](../notebooks/getting-started.ipynb)
Expand All @@ -24,25 +22,31 @@ The key objects in the Python API include:
- **UnityEnvironment** — the main interface between the Unity application and
your code. Use UnityEnvironment to start and control a simulation or training
session.
- **BrainInfo** — contains all the data from Agents in the simulation, such as
observations and rewards.
- **BrainParameters** — describes the data elements in a BrainInfo object. For
example, provides the array length of an observation in BrainInfo.
- **BatchedStepResult** — contains the data from Agents belonging to the same
"AgentGroup" in the simulation, such as observations and rewards.
- **AgentGroupSpec** — describes the shape of the data inside a BatchedStepResult.
For example, provides the dimensions of the observations of a group.

These classes are all defined in the `ml-agents/mlagents/envs` folder of
the ML-Agents SDK.
These classes are all defined in the [base_env](../ml-agents-envs/mlagents/envs/base_env.py)
script.

An Agent Group is a group of Agents identified by a string name that share the same
observations and action types. You can think about Agent Group as a group of agents
that will share the same policy or behavior. All Agents in a group have the same goal
and reward signals.

To communicate with an Agent in a Unity environment from a Python program, the
Agent must use a LearningBrain.
Your code is expected to return
actions for Agents with LearningBrains.
Agent in the simulation must have `Behavior Parameters` set to communicate. You
must set the `Behavior Type` to `Default` and give it a `Behavior Name`.

__Note__: The `Behavior Name` corresponds to the Agent Group name on Python.

_Notice: Currently communication between Unity and Python takes place over an
open socket without authentication. As such, please make sure that the network
where training takes place is secure. This will be addressed in a future
release._

### Loading a Unity Environment
## Loading a Unity Environment

Python-side communication happens through `UnityEnvironment` which is located in
`ml-agents/mlagents/envs`. To load a Unity environment from a built binary
Expand All @@ -51,7 +55,7 @@ of your Unity environment is 3DBall.app, in python, run:

```python
from mlagents.envs.environment import UnityEnvironment
env = UnityEnvironment(file_name="3DBall", worker_id=0, seed=1)
env = UnityEnvironment(file_name="3DBall", base_port=5005, seed=1, side_channels=[])
```

- `file_name` is the name of the environment binary (located in the root
Expand All @@ -62,6 +66,9 @@ env = UnityEnvironment(file_name="3DBall", worker_id=0, seed=1)
training process. In environments which do not involve physics calculations,
setting the seed enables reproducible experimentation by ensuring that the
environment and trainers utilize the same random seed.
- `side_channels` provides a way to exchange data with the Unity simulation that
is not related to the reinforcement learning loop. For example: configurations
or properties. More on them in the [Modifying the environment from Python](Python-API.md#modifying-the-environment-from-python) section.

If you want to directly interact with the Editor, you need to use
`file_name=None`, then press the :arrow_forward: button in the Editor when the
Expand All @@ -70,59 +77,125 @@ displayed on the screen

### Interacting with a Unity Environment

A BrainInfo object contains the following fields:

- **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of
the list corresponds to the n<sup>th</sup> observation of the Brain.
- **`vector_observations`** : A two dimensional numpy array of dimension `(batch
size, vector observation size)`.
- **`rewards`** : A list as long as the number of Agents using the Brain
containing the rewards they each obtained at the previous step.
- **`local_done`** : A list as long as the number of Agents using the Brain
containing `done` flags (whether or not the Agent is done).
- **`max_reached`** : A list as long as the number of Agents using the Brain
containing true if the Agents reached their max steps.
- **`agents`** : A list of the unique ids of the Agents using the Brain.

Once loaded, you can use your UnityEnvironment object, which referenced by a
variable named `env` in this example, can be used in the following way:

- **Print : `print(str(env))`**
Prints all parameters relevant to the loaded environment and the
Brains.
- **Reset : `env.reset()`**
Send a reset signal to the environment, and provides a dictionary mapping
Brain names to BrainInfo objects.
- **Step : `env.step(action)`**
Sends a step signal to the environment using the actions. For each Brain :
- `action` can be one dimensional arrays or two dimensional arrays if you have
multiple Agents per Brain.

Returns a dictionary mapping Brain names to BrainInfo objects.

For example, to access the BrainInfo belonging to a Brain called
'brain_name', and the BrainInfo field 'vector_observations':

```python
info = env.step()
brainInfo = info['brain_name']
observations = brainInfo.vector_observations
```

Note that if you have more than one LearningBrain in the scene, you
must provide dictionaries from Brain names to arrays for `action`, `memory`
and `value`. For example: If you have two Learning Brains named `brain1` and
`brain2` each with one Agent taking two continuous actions, then you can
have:

```python
action = {'brain1':[1.0, 2.0], 'brain2':[3.0,4.0]}
```

Returns a dictionary mapping Brain names to BrainInfo objects.
- **Close : `env.close()`**
Sends a shutdown signal to the environment and closes the communication
socket.
#### The BaseEnv interface

A `BaseEnv` has the following methods:

- **Reset : `env.reset()`** Sends a signal to reset the environment. Returns None.
- **Step : `env.step()`** Sends a signal to step the environment. Returns None.
Note that a "step" for Python does not correspond to either Unity `Update` nor
`FixedUpdate`. When `step()` or `reset()` is called, the Unity simulation will
move forward until an Agent in the simulation needs a input from Python to act.
- **Close : `env.close()`** Sends a shutdown signal to the environment and terminates
the communication.
- **Get Agent Group Names : `env.get_agent_groups()`** Returns a list of agent group ids.
Note that the number of groups can change over time in the simulation if new
agent groups are created in the simulation.
- **Get Agent Group Spec : `env.get_agent_group_spec(agent_group: str)`** Returns
the `AgentGroupSpec` corresponding to the agent_group given as input. An
`AgentGroupSpec` contains information such as the observation shapes, the action
type (multi-discrete or continuous) and the action shape. Note that the `AgentGroupSpec`
for a specific group is fixed throughout the simulation.
- **Get Batched Step Result for Agent Group : `env.get_step_result(agent_group: str)`**
Returns a `BatchedStepResult` corresponding to the agent_group given as input.
A `BatchedStepResult` contains information about the state of the agents in a group
such as the observations, the rewards, the done flags and the agent identifiers. The
data is in `np.array` of which the first dimension is always the number of agents which
requested a decision in the simulation since the last call to `env.step()` note that the
number of agents is not guaranteed to remain constant during the simulation.
- **Set Actions for Agent Group :`env.set_actions(agent_group: str, action: np.array)`**
Sets the actions for a whole agent group. `action` is a 2D `np.array` of `dtype=np.int32`
in the discrete action case and `dtype=np.float32` in the continuous action case.
The first dimension of `action` is the number of agents that requested a decision
since the last call to `env.step()`. The second dimension is the number of discrete actions
in multi-discrete action type and the number of actions in continuous action type.
- **Set Action for Agent : `env.set_action_for_agent(agent_group: str, agent_id: int, action: np.array)`**
Sets the action for a specific Agent in an agent group. `agent_group` is the name of the
group the Agent belongs to and `agent_id` is the integer identifier of the Agent. Action
is a 1D array of type `dtype=np.int32` and size equal to the number of discrete actions
in multi-discrete action type and of type `dtype=np.float32` and size equal to the number
of actions in continuous action type.


__Note:__ If no action is provided for an agent group between two calls to `env.step()` then
the default action will be all zeros (in either discrete or continuous action space)
#### BathedStepResult and StepResult

A `BatchedStepResult` has the following fields :

- `obs` is a list of numpy arrays observations collected by the group of
agent. The first dimension of the array corresponds to the batch size of
the group (number of agents requesting a decision since the last call to
`env.step()`).
- `reward` is a float vector of length batch size. Corresponds to the
rewards collected by each agent since the last simulation step.
- `done` is an array of booleans of length batch size. Is true if the
associated Agent was terminated during the last simulation step.
- `max_step` is an array of booleans of length batch size. Is true if the
associated Agent reached its maximum number of steps during the last
simulation step.
- `agent_id` is an int vector of length batch size containing unique
identifier for the corresponding Agent. This is used to track Agents
across simulation steps.
- `action_mask` is an optional list of two dimensional array of booleans.
Only available in multi-discrete action space type.
Each array corresponds to an action branch. The first dimension of each
array is the batch size and the second contains a mask for each action of
the branch. If true, the action is not available for the agent during
this simulation step.

It also has the two following methods:

- `n_agents()` Returns the number of agents requesting a decision since
the last call to `env.step()`
- `get_agent_step_result(agent_id: int)` Returns a `StepResult`
for the Agent with the `agent_id` unique identifier.

A `StepResult` has the following fields:

- `obs` is a list of numpy arrays observations collected by the group of
agent. (Each array has one less dimension than the arrays in `BatchedStepResult`)
- `reward` is a float. Corresponds to the rewards collected by the agent
since the last simulation step.
- `done` is a bool. Is true if the Agent was terminated during the last
simulation step.
- `max_step` is a bool. Is true if the Agent reached its maximum number of
steps during the last simulation step.
- `agent_id` is an int and an unique identifier for the corresponding Agent.
- `action_mask` is an optional list of one dimensional array of booleans.
Only available in multi-discrete action space type.
Each array corresponds to an action branch. Each array contains a mask
for each action of the branch. If true, the action is not available for
the agent during this simulation step.

#### AgentGroupSpec

An Agent group can either have discrete or continuous actions. To check which type
it is, use `spec.is_action_discrete()` or `spec.is_action_continuous()` to see
which one it is. If discrete, the action tensors are expected to be `np.int32`. If
continuous, the actions are expected to be `np.float32`.

An `AgentGroupSpec` has the following fields :

- `observation_shapes` is a List of Tuples of int : Each Tuple corresponds
to an observation's dimensions (without the number of agents dimension).
The shape tuples have the same ordering as the ordering of the
BatchedStepResult and StepResult.
- `action_type` is the type of data of the action. it can be discrete or
continuous. If discrete, the action tensors are expected to be `np.int32`. If
continuous, the actions are expected to be `np.float32`.
- `action_size` is an `int` corresponding to the expected dimension of the action
array.
- In continuous action space it is the number of floats that constitute the action.
- In discrete action space (same as multi-discrete) it corresponds to the
number of branches (the number of independent actions)
- `discrete_action_branches` is a Tuple of int only for discrete action space. Each int
corresponds to the number of different options for each branch of the action.
For example : In a game direction input (no movement, left, right) and jump input
(no jump, jump) there will be two branches (direction and jump), the first one with 3
options and the second with 2 options. (`action_size = 2` and
`discrete_action_branches = (3,2,)`)


### Modifying the environment from Python
The Environment can be modified by using side channels to send data to the
Expand Down Expand Up @@ -194,8 +267,3 @@ var academy = FindObjectOfType<Academy>();
var sharedProperties = academy.FloatProperties;
float property1 = sharedProperties.GetPropertyWithDefault("parameter_1", 0.0f);
```

## mlagents-learn

For more detailed documentation on using `mlagents-learn`, check out
[Training ML-Agents](Training-ML-Agents.md)