Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
cc9926d
[skip ci] WIP : Modify the base_env.py file
vincentpierre Mar 24, 2020
5d3f47e
[skip ci] typo
vincentpierre Mar 24, 2020
c87b621
[skip ci] renamed some methods
vincentpierre Mar 25, 2020
a5544e3
[skip ci] Incorporated changes from our meeting
vincentpierre Apr 1, 2020
04e4695
[skip ci] everything is broken
vincentpierre Apr 1, 2020
a8f02ad
[skip ci] everything is broken
vincentpierre Apr 1, 2020
3230a5d
[skip ci] merge master
vincentpierre Apr 1, 2020
a6ace71
[skip ci] formatting
vincentpierre Apr 1, 2020
678604a
Fixing the gym tests
vincentpierre Apr 2, 2020
2054f23
Fixing bug, C# has an error that needs fixing
vincentpierre Apr 2, 2020
2f3af28
Fixing the test
vincentpierre Apr 2, 2020
6d33f9f
relaxing the threshold of 0.99 to 0.9
vincentpierre Apr 2, 2020
ee08cde
fixing the C# side
vincentpierre Apr 2, 2020
c479452
formating
vincentpierre Apr 2, 2020
348df1b
Fixed the llapi integratio test
vincentpierre Apr 2, 2020
7b9547c
[Increasing steps for testing]
vincentpierre Apr 2, 2020
8e525fe
Fixing the python tests
vincentpierre Apr 2, 2020
7eb1113
Need __contains__ after all
vincentpierre Apr 2, 2020
9e534b5
changing the max_steps in the tests
vincentpierre Apr 2, 2020
c8d9b68
addressing comments
vincentpierre Apr 3, 2020
c00a87d
Making env_manager logic clearer as proposed in the comments
vincentpierre Apr 3, 2020
727751a
Remove duplicated logic and added back in episode length (#3728)
Apr 3, 2020
b52f4b9
removing mentions of multi-agent in gym and changed the docstring in …
vincentpierre Apr 3, 2020
3dc0dea
Edited the Documentation for the changes to the LLAPI (#3733)
vincentpierre Apr 3, 2020
2fe52b3
Merge branch 'master' into develop-refactor-ll
vincentpierre Apr 3, 2020
bc17d40
Resolving silent merge conflicts
vincentpierre Apr 3, 2020
3aad62f
[skip ci] Update docs/Python-API.md
vincentpierre Apr 4, 2020
10a6ff5
[skip ci] Update docs/Python-API.md
vincentpierre Apr 4, 2020
6e59e0f
[skip ci] Update gym-unity/README.md
vincentpierre Apr 4, 2020
28ac808
[skip ci] Update docs/Python-API.md
vincentpierre Apr 4, 2020
f5bb644
Added a sentence asking people to use the llapi instead of gym for mu…
vincentpierre Apr 6, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions com.unity.ml-agents/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- The Jupyter notebooks have been removed from the repository.
- Introduced the `SideChannelUtils` to register, unregister and access side channels.
- `Academy.FloatProperties` was removed, please use `SideChannelUtils.GetSideChannel<FloatPropertiesChannel>()` instead.
- Removed the multi-agent gym option from the gym wrapper. For multi-agent scenarios, use the [Low Level Python API](Python-API.md).
- The low level Python API has changed. You can look at the document [Low Level Python API documentation](Python-API.md) for more information. If you use `mlagents-learn` for training, this should be a transparent change.
- Added ability to start training (initialize model weights) from a previous run ID. (#3710)

### Minor Changes
Expand Down
2 changes: 0 additions & 2 deletions com.unity.ml-agents/Runtime/Agent.cs
Original file line number Diff line number Diff line change
Expand Up @@ -337,8 +337,6 @@ void NotifyAgentDone(DoneReason doneReason)
UpdateRewardStats();
}

// The Agent is done, so we give it a new episode Id
m_EpisodeId = EpisodeIdCounter.GetEpisodeId();
m_Reward = 0f;
m_CumulativeReward = 0f;
m_RequestAction = false;
Expand Down
5 changes: 4 additions & 1 deletion com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,10 @@ public void PutObservations(string behaviorName, AgentInfo info, List<ISensor> s
{
m_OrderedAgentsRequestingDecisions[behaviorName] = new List<int>();
}
m_OrderedAgentsRequestingDecisions[behaviorName].Add(info.episodeId);
if (!info.done)
{
m_OrderedAgentsRequestingDecisions[behaviorName].Add(info.episodeId);
}
if (!m_LastActionsReceived.ContainsKey(behaviorName))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make sure that the agents that are done will not be expecting a decision.

{
m_LastActionsReceived[behaviorName] = new Dictionary<int, float[]>();
Expand Down
2 changes: 2 additions & 0 deletions docs/Migrating.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ The versions can be found in
* The `--load` and `--train` command-line flags have been deprecated and replaced with `--resume` and `--inference`.
* Running with the same `--run-id` twice will now throw an error.
* The `play_against_current_self_ratio` self-play trainer hyperparameter has been renamed to `play_against_latest_model_ratio`
* Removed the multi-agent gym option from the gym wrapper. For multi-agent scenarios, use the [Low Level Python API](Python-API.md).
* The low level Python API has changed. You can look at the document [Low Level Python API documentation](Python-API.md) for more information. If you use `mlagents-learn` for training, this should be a transparent change.

### Steps to Migrate
* Replace the `--load` flag with `--resume` when calling `mlagents-learn`, and don't use the `--train` flag as training
Expand Down
141 changes: 100 additions & 41 deletions docs/Python-API.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,25 +22,33 @@ The key objects in the Python API include:
- **UnityEnvironment** — the main interface between the Unity application and
your code. Use UnityEnvironment to start and control a simulation or training
session.
- **BatchedStepResult** — contains the data from Agents belonging to the same
"AgentGroup" in the simulation, such as observations and rewards.
- **AgentGroupSpec** — describes the shape of the data inside a BatchedStepResult.
For example, provides the dimensions of the observations of a group.
- **BehaviorName** - is a string that identifies a behavior in the simulation.
- **AgentId** - is an `int` that serves as unique identifier for Agents in the
simulation.
- **DecisionSteps** — contains the data from Agents belonging to the same
"Behavior" in the simulation, such as observations and rewards. Only Agents
that requested a decision since the last call to `env.step()` are in the
DecisionSteps object.
- **TerminalSteps** — contains the data from Agents belonging to the same
"Behavior" in the simulation, such as observations and rewards. Only Agents
whose episode ended since the last call to `env.step()` are in the
TerminalSteps object.
- **BehaviorSpec** — describes the shape of the observation data inside
DecisionSteps and TerminalSteps as well as the expected action shapes.


These classes are all defined in the [base_env](../ml-agents-envs/mlagents_envs/base_env.py)
script.

An Agent Group is a group of Agents identified by a string name that share the same
observations and action types. You can think about Agent Group as a group of agents
that will share the same policy or behavior. All Agents in a group have the same goal
and reward signals.
An Agent "Behavior" is a group of Agents identified by a `BehaviorName` that share the same
observations and action types (described in their `BehaviorSpec`). You can think about Agent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean "observation and action types" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean observations types. There can be multiple observations of different shapes.

Behavior as a group of agents that will share the same policy. All Agents with the same
behavior have the same goal and reward signals.

To communicate with an Agent in a Unity environment from a Python program, the
Agent in the simulation must have `Behavior Parameters` set to communicate. You
must set the `Behavior Type` to `Default` and give it a `Behavior Name`.

__Note__: The `Behavior Name` corresponds to the Agent Group name on Python.

_Notice: Currently communication between Unity and Python takes place over an
open socket without authentication. As such, please make sure that the network
where training takes place is secure. This will be addressed in a future
Expand Down Expand Up @@ -88,22 +96,30 @@ A `BaseEnv` has the following methods:
move forward until an Agent in the simulation needs a input from Python to act.
- **Close : `env.close()`** Sends a shutdown signal to the environment and terminates
the communication.
- **Get Agent Group Names : `env.get_agent_groups()`** Returns a list of agent group ids.
- **Get Behavior Names : `env.get_behavior_names()`** Returns a list of `BehaviorName`.
Note that the number of groups can change over time in the simulation if new
agent groups are created in the simulation.
- **Get Agent Group Spec : `env.get_agent_group_spec(agent_group: str)`** Returns
the `AgentGroupSpec` corresponding to the agent_group given as input. An
`AgentGroupSpec` contains information such as the observation shapes, the action
type (multi-discrete or continuous) and the action shape. Note that the `AgentGroupSpec`
Agent behaviors are created in the simulation.
- **Get Behavior Spec : `env.get_behavior_spec(behavior_name: str)`** Returns
the `BehaviorSpec` corresponding to the behavior_name given as input. A
`BehaviorSpec` contains information such as the observation shapes, the action
type (multi-discrete or continuous) and the action shape. Note that the `BehaviorSpec`
for a specific group is fixed throughout the simulation.
- **Get Batched Step Result for Agent Group : `env.get_step_result(agent_group: str)`**
Returns a `BatchedStepResult` corresponding to the agent_group given as input.
A `BatchedStepResult` contains information about the state of the agents in a group
such as the observations, the rewards, the done flags and the agent identifiers. The
data is in `np.array` of which the first dimension is always the number of agents which
requested a decision in the simulation since the last call to `env.step()` note that the
number of agents is not guaranteed to remain constant during the simulation.
- **Set Actions for Agent Group :`env.set_actions(agent_group: str, action: np.array)`**
- **Get Steps : `env.get_steps(behavior_name: str)`**
Returns a tuple `DecisionSteps, TerminalSteps` corresponding to the behavior_name
given as input.
The `DecisionSteps` contains information about the state of the agents
**that need an action this step** and have the behavior behavior_name.
The `TerminalSteps` contains information about the state of the agents
**whose episode ended** and have the behavior behavior_name.
Both `DecisionSteps` and `TerminalSteps` contain information such as
the observations, the rewards and the agent identifiers.
`DecisionSteps` also contains action masks for the next action while `TerminalSteps`
contains the reason for termination (did the Agent reach its maximum step and was
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or was ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now reaching max_step is the only way to interrupt an agent, so I meant and. Is that reasonable ?

interrupted). The data is in `np.array` of which the first dimension is always the
number of agents note that the number of agents is not guaranteed to remain constant
during the simulation and it is not unusual to have either `DecisionSteps` or `TerminalSteps`
contain no Agents at all.
- **Set Actions :`env.set_actions(behavior_name: str, action: np.array)`**
Sets the actions for a whole agent group. `action` is a 2D `np.array` of `dtype=np.int32`
in the discrete action case and `dtype=np.float32` in the continuous action case.
The first dimension of `action` is the number of agents that requested a decision
Expand All @@ -119,9 +135,13 @@ A `BaseEnv` has the following methods:

__Note:__ If no action is provided for an agent group between two calls to `env.step()` then
the default action will be all zeros (in either discrete or continuous action space)
#### BathedStepResult and StepResult

A `BatchedStepResult` has the following fields :
#### DecisionSteps and DecisionStep

`DecisionSteps` (with `s`) contains information about a whole batch of Agents while
`DecisionStep` (no `s`) only contains information about a single Agent.

A `DecisionSteps` has the following fields :

- `obs` is a list of numpy arrays observations collected by the group of
agent. The first dimension of the array corresponds to the batch size of
Expand All @@ -131,9 +151,6 @@ A `BatchedStepResult` has the following fields :
rewards collected by each agent since the last simulation step.
- `done` is an array of booleans of length batch size. Is true if the
associated Agent was terminated during the last simulation step.
- `max_step` is an array of booleans of length batch size. Is true if the
associated Agent reached its maximum number of steps during the last
simulation step.
- `agent_id` is an int vector of length batch size containing unique
identifier for the corresponding Agent. This is used to track Agents
across simulation steps.
Expand All @@ -146,41 +163,82 @@ A `BatchedStepResult` has the following fields :

It also has the two following methods:

- `n_agents()` Returns the number of agents requesting a decision since
the last call to `env.step()`
- `get_agent_step_result(agent_id: int)` Returns a `StepResult`
- `len(DecisionSteps)` Returns the number of agents requesting a decision since
the last call to `env.step()`.
- `DecisionSteps[agent_id]` Returns a `DecisionStep`
for the Agent with the `agent_id` unique identifier.

A `StepResult` has the following fields:
A `DecisionStep` has the following fields:

- `obs` is a list of numpy arrays observations collected by the group of
agent. (Each array has one less dimension than the arrays in `BatchedStepResult`)
- `obs` is a list of numpy arrays observations collected by the agent.
(Each array has one less dimension than the arrays in `DecisionSteps`)
- `reward` is a float. Corresponds to the rewards collected by the agent
since the last simulation step.
- `done` is a bool. Is true if the Agent was terminated during the last
simulation step.
- `max_step` is a bool. Is true if the Agent reached its maximum number of
steps during the last simulation step.
- `agent_id` is an int and an unique identifier for the corresponding Agent.
- `action_mask` is an optional list of one dimensional array of booleans.
Only available in multi-discrete action space type.
Each array corresponds to an action branch. Each array contains a mask
for each action of the branch. If true, the action is not available for
the agent during this simulation step.

#### AgentGroupSpec
#### TerminalSteps and TerminalStep

Similarly to `DecisionSteps` and `DecisionStep`,
`TerminalSteps` (with `s`) contains information about a whole batch of Agents while
`TerminalStep` (no `s`) only contains information about a single Agent.

An Agent group can either have discrete or continuous actions. To check which type
A `TerminalSteps` has the following fields :

- `obs` is a list of numpy arrays observations collected by the group of
agent. The first dimension of the array corresponds to the batch size of
the group (number of agents requesting a decision since the last call to
`env.step()`).
- `reward` is a float vector of length batch size. Corresponds to the
rewards collected by each agent since the last simulation step.
- `done` is an array of booleans of length batch size. Is true if the
associated Agent was terminated during the last simulation step.
- `agent_id` is an int vector of length batch size containing unique
identifier for the corresponding Agent. This is used to track Agents
across simulation steps.
- `max_step` is an array of booleans of length batch size. Is true if the
associated Agent reached its maximum number of steps during the last
simulation step.

It also has the two following methods:

- `len(TerminalSteps)` Returns the number of agents requesting a decision since
the last call to `env.step()`.
- `TerminalSteps[agent_id]` Returns a `TerminalStep`
for the Agent with the `agent_id` unique identifier.

A `TerminalStep` has the following fields:

- `obs` is a list of numpy arrays observations collected by the agent.
(Each array has one less dimension than the arrays in `TerminalSteps`)
- `reward` is a float. Corresponds to the rewards collected by the agent
since the last simulation step.
- `done` is a bool. Is true if the Agent was terminated during the last
simulation step.
- `agent_id` is an int and an unique identifier for the corresponding Agent.
- `max_step` is a bool. Is true if the Agent reached its maximum number of
steps during the last simulation step.


#### BehaviorSpec

An Agent behavior can either have discrete or continuous actions. To check which type
it is, use `spec.is_action_discrete()` or `spec.is_action_continuous()` to see
which one it is. If discrete, the action tensors are expected to be `np.int32`. If
continuous, the actions are expected to be `np.float32`.

An `AgentGroupSpec` has the following fields :
A `BehaviorSpec` has the following fields :

- `observation_shapes` is a List of Tuples of int : Each Tuple corresponds
to an observation's dimensions (without the number of agents dimension).
The shape tuples have the same ordering as the ordering of the
BatchedStepResult and StepResult.
DecisionSteps, DecisionStep, TerminalSteps and TerminalStep.
- `action_type` is the type of data of the action. it can be discrete or
continuous. If discrete, the action tensors are expected to be `np.int32`. If
continuous, the actions are expected to be `np.float32`.
Expand All @@ -198,6 +256,7 @@ An `AgentGroupSpec` has the following fields :


### Communicating additional information with the Environment

In addition to the means of communicating between Unity and python described above,
we also provide methods for sharing agent-agnostic information. These
additional methods are referred to as side channels. ML-Agents includes two ready-made
Expand Down
15 changes: 6 additions & 9 deletions gym-unity/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ environments is via a wrapper provided by OpenAI called `gym`. For more
information on the gym interface, see [here](https://github.com/openai/gym).

We provide a gym wrapper and instructions for using it with existing machine
learning algorithms which utilize gyms. Both wrappers provide interfaces on top
learning algorithms which utilize gym. Our wrapper provides interfaces on top
of our `UnityEnvironment` class, which is the default way of interfacing with a
Unity environment via Python.

Expand All @@ -20,7 +20,7 @@ pip install gym_unity
or by running the following from the `/gym-unity` directory of the repository:

```sh
pip install .
pip install -e .
```

## Using the Gym Wrapper
Expand All @@ -31,7 +31,7 @@ from the root of the project repository use:
```python
from gym_unity.envs import UnityEnv

env = UnityEnv(environment_filename, worker_id, use_visual, uint8_visual, multiagent)
env = UnityEnv(environment_filename, worker_id, use_visual, uint8_visual)
```

* `environment_filename` refers to the path to the Unity environment.
Expand All @@ -47,9 +47,6 @@ env = UnityEnv(environment_filename, worker_id, use_visual, uint8_visual, multia
(0-255). Many common Gym environments (e.g. Atari) do this. By default they
will be floats (0.0-1.0). Defaults to `False`.

* `multiagent` refers to whether you intent to launch an environment which
contains more than one agent. Defaults to `False`.

* `flatten_branched` will flatten a branched discrete action space into a Gym Discrete.
Otherwise, it will be converted into a MultiDiscrete. Defaults to `False`.

Expand All @@ -61,14 +58,14 @@ The returned environment `env` will function as a gym.

## Limitations

* It is only possible to use an environment with a single Agent.
* It is only possible to use an environment with a **single** Agent.
* By default, the first visual observation is provided as the `observation`, if
present. Otherwise, vector observations are provided. You can receive all visual
observations by using the `allow_multiple_visual_obs=True` option in the gym
parameters. If set to `True`, you will receive a list of `observation` instead
of only the first one.
* The `BatchedStepResult` output from the environment can still be accessed from the
`info` provided by `env.step(action)`.
* The `TerminalSteps` or `DecisionSteps` output from the environment can still be
accessed from the `info` provided by `env.step(action)`.
* Stacked vector observations are not supported.
* Environment registration for use with `gym.make()` is currently not supported.

Expand Down
Loading