Skip to content

Changes to the LL-API - Refactor of “done” logic #3681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Apr 6, 2020

Conversation

vincentpierre
Copy link
Contributor

@vincentpierre vincentpierre commented Mar 24, 2020

Proposed change(s)

Proposed changes to the API include:

  • Renaming StepResult to DecisionStep to reflect the idea that C# is requesting decisions for the Agent and not the other way around
  • Renaming BatchedStepResult to DecisionSteps
  • Creating the new TerminalStep and TerminalSteps namedtuple with the fields:
    • obs
    • reward
    • max_step
    • agent_id
  • Note that done is now implicit if the agent is in the TerminalSteps
  • Removing done and max_step fields from both DecisionStep and DecisionSteps
  • Changing DecisionSteps.get_agent_step_result() to be a dictionary like API.
  • Renaming BaseEnv.get_agent_groups() to BaseEnv.get_behavior_names() for simplicity.
  • Renaming BaseEnv.get_agent_group_spec() to BaseEnv.get_behavior_spec() for simplicity.
  • Renaming BaseEnv.get_step_result() to BaseEnv.get_steps() that returns a Tuple[DecisionSteps, TerminalSteps].

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Design Document
Brainstorm document
Jira : MLA-793

Types of change(s)

  • [ ] Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • [ ] Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

gym test is broken and needs to be replaced with a single agent gym environment.
Need to change documentation once this PR is approved

@vincentpierre vincentpierre changed the title WIP : Changes to the LL-API - TerminationStepResult WIP : Changes to the LL-API - Refactor of “done” logic Mar 26, 2020
@@ -3,12 +3,12 @@
The aim of this API is to expose groups of similar Agents evolving in Unity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there is still a lot of discussion about "groups" here. If the concept no longer exists in the API, should we still be talking about it?

Copy link
Contributor

@chriselion chriselion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a bit of feedback, but didn't get into the details of the Agent processing.

If you want, feel free to comment out the gym yamato test for now, until you get it working with single-agent in the other PR.

@vincentpierre
Copy link
Contributor Author

Left a bit of feedback, but didn't get into the details of the Agent processing.

Thanks, I incorporated some of these changes. I want this to be informally approved before I made another PR targeting this one with the docs changes.

If you want, feel free to comment out the gym yamato test for now, until you get it working with single-agent in the other PR.

I think I will wait on #3725 to be merged and then merge master into this one to fix the missing test.

* Edited the Documentation for the changes to the LLAPI

* Forgot the CHANGELOG

* Fixing a typo raised by #3731

* [skip ci] Update com.unity.ml-agents/CHANGELOG.md

* [skip ci] Update docs/Migrating.md

* [skip ci] Update docs/Python-API.md

* [skip ci] Update docs/Python-API.md
that will share the same policy or behavior. All Agents in a group have the same goal
and reward signals.
An Agent "Behavior" is a group of Agents identified by a `BehaviorName` that share the same
observations and action types (described in their `BehaviorSpec`). You can think about Agent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean "observation and action types" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean observations types. There can be multiple observations of different shapes.

Both `DecisionSteps` and `TerminalSteps` contain information such as
the observations, the rewards and the agent identifiers.
`DecisionSteps` also contains action masks for the next action while `TerminalSteps`
contains the reason for termination (did the Agent reach its maximum step and was
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or was ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now reaching max_step is the only way to interrupt an agent, so I meant and. Is that reasonable ?

@vincentpierre
Copy link
Contributor Author

@chriselion @awjuliani @andrewcoh @ervteng
Any requests for change?

@@ -12,6 +12,8 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- The Jupyter notebooks have been removed from the repository.
- Introduced the `SideChannelUtils` to register, unregister and access side channels.
- `Academy.FloatProperties` was removed, please use `SideChannelUtils.GetSideChannel<FloatPropertiesChannel>()` instead.
- Removed the multi-agent gym option from the gym wrapper.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a line here telling people they should use the LL-API if they want multi-agent support for their custom trainers/research.

Copy link
Contributor

@awjuliani awjuliani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@vincentpierre vincentpierre merged commit d4810e2 into master Apr 6, 2020
@vincentpierre vincentpierre deleted the develop-refactor-ll branch April 6, 2020 19:50
@chriselion chriselion changed the title WIP : Changes to the LL-API - Refactor of “done” logic Changes to the LL-API - Refactor of “done” logic Apr 27, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants