Changes to the LL-API - Refactor of “done” logic #3681

vincentpierre · 2020-03-24T22:45:48Z

Proposed change(s)

Proposed changes to the API include:

Renaming StepResult to DecisionStep to reflect the idea that C# is requesting decisions for the Agent and not the other way around
Renaming BatchedStepResult to DecisionSteps
Creating the new TerminalStep and TerminalSteps namedtuple with the fields:
- obs
- reward
- max_step
- agent_id
Note that done is now implicit if the agent is in the TerminalSteps
Removing done and max_step fields from both DecisionStep and DecisionSteps
Changing DecisionSteps.get_agent_step_result() to be a dictionary like API.
Renaming BaseEnv.get_agent_groups() to BaseEnv.get_behavior_names() for simplicity.
Renaming BaseEnv.get_agent_group_spec() to BaseEnv.get_behavior_spec() for simplicity.
Renaming BaseEnv.get_step_result() to BaseEnv.get_steps() that returns a Tuple[DecisionSteps, TerminalSteps].

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Design Document
Brainstorm document
Jira : MLA-793

Types of change(s)

~~[ ] Bug fix~~
New feature
Code refactor
Breaking change
Documentation update
~~[ ] Other (please describe)~~

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

gym test is broken and needs to be replaced with a single agent gym environment.
Need to change documentation once this PR is approved

ml-agents-envs/mlagents_envs/base_env.py

awjuliani · 2020-04-01T19:08:12Z

ml-agents-envs/mlagents_envs/base_env.py

@@ -3,12 +3,12 @@
 The aim of this API is to expose groups of similar Agents evolving in Unity


It seems like there is still a lot of discussion about "groups" here. If the concept no longer exists in the API, should we still be talking about it?

ml-agents-envs/mlagents_envs/base_env.py

gym-unity/gym_unity/envs/__init__.py

chriselion

Left a bit of feedback, but didn't get into the details of the Agent processing.

If you want, feel free to comment out the gym yamato test for now, until you get it working with single-agent in the other PR.

vincentpierre · 2020-04-03T19:34:17Z

Left a bit of feedback, but didn't get into the details of the Agent processing.

Thanks, I incorporated some of these changes. I want this to be informally approved before I made another PR targeting this one with the docs changes.

If you want, feel free to comment out the gym yamato test for now, until you get it working with single-agent in the other PR.

I think I will wait on #3725 to be merged and then merge master into this one to fix the missing test.

…base_env.py

* Edited the Documentation for the changes to the LLAPI * Forgot the CHANGELOG * Fixing a typo raised by #3731 * [skip ci] Update com.unity.ml-agents/CHANGELOG.md * [skip ci] Update docs/Migrating.md * [skip ci] Update docs/Python-API.md * [skip ci] Update docs/Python-API.md

andrewcoh · 2020-04-04T00:35:40Z

docs/Python-API.md

-that will share the same policy or behavior. All Agents in a group have the same goal
-and reward signals.
+An Agent "Behavior" is a group of Agents identified by a `BehaviorName` that share the same
+observations and action types (described in their `BehaviorSpec`). You can think about Agent


Do you mean "observation and action types" ?

I mean observations types. There can be multiple observations of different shapes.

docs/Python-API.md

andrewcoh · 2020-04-04T00:39:30Z

docs/Python-API.md

+   Both `DecisionSteps` and `TerminalSteps` contain information such as
+   the observations, the rewards and the agent identifiers.
+   `DecisionSteps` also contains action masks for the next action while `TerminalSteps`
+   contains the reason for termination (did the Agent reach its maximum step and was


Right now reaching max_step is the only way to interrupt an agent, so I meant and. Is that reasonable ?

docs/Python-API.md

gym-unity/README.md

vincentpierre · 2020-04-06T17:40:17Z

@chriselion @awjuliani @andrewcoh @ervteng
Any requests for change?

awjuliani · 2020-04-06T18:01:11Z

com.unity.ml-agents/CHANGELOG.md

@@ -12,6 +12,8 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
 - The Jupyter notebooks have been removed from the repository.
 - Introduced the `SideChannelUtils` to register, unregister and access side channels.
 - `Academy.FloatProperties` was removed, please use `SideChannelUtils.GetSideChannel<FloatPropertiesChannel>()` instead.
+ - Removed the multi-agent gym option from the gym wrapper.


Maybe a line here telling people they should use the LL-API if they want multi-agent support for their custom trainers/research.

awjuliani

LGTM.

…lti-agents

vincentpierre added 2 commits March 24, 2020 15:40

[skip ci] WIP : Modify the base_env.py file

cc9926d

[skip ci] typo

5d3f47e

vincentpierre requested review from ervteng, chriselion, awjuliani and andrewcoh March 24, 2020 22:45

vincentpierre self-assigned this Mar 24, 2020

[skip ci] renamed some methods

c87b621

chriselion reviewed Mar 25, 2020

View reviewed changes