-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Change Agent.Heuristic to take a float[] argument #3765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -31,7 +34,9 @@ public void RequestDecision(AgentInfo info, List<ISensor> sensors) | |||
StepSensors(sensors); | |||
if (!info.done) | |||
{ | |||
m_LastDecision = m_Heuristic.Invoke(); | |||
// Reset m_LastDecision each time. | |||
m_LastDecision = new float[m_numActions]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't decide whether or not to reuse the array here. I figure this approach is safer, and this isn't performance-critical enough to worry about the allocation/GC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it depends on the number of agents and the size of the action array. If there is a large action space and a QA person is trying to make a demo file to use in training but they are hitting GC hitches every few seconds that could make it 1. hard to actually make a good demo. 2. frustrate the hell out of that person.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline - we need to do a deeper review of array sharing and/or copying around this (and the other policies). Logged as https://jira.unity3d.com/browse/MLA-892 for followup.
overall looks good. I would rather we re-use the array. If you feel strongly about no re-using it, let's talk. |
docs/Migrating.md
Outdated
@@ -15,6 +15,7 @@ The versions can be found in | |||
* The `play_against_current_self_ratio` self-play trainer hyperparameter has been renamed to `play_against_latest_model_ratio` | |||
* Removed the multi-agent gym option from the gym wrapper. For multi-agent scenarios, use the [Low Level Python API](Python-API.md). | |||
* The low level Python API has changed. You can look at the document [Low Level Python API documentation](Python-API.md) for more information. If you use `mlagents-learn` for training, this should be a transparent change. | |||
* The signature of `Agent.Heuristic()` was changes to take the `float[]` actions as a parameter, instead of returning the array. This was done to prevent a common source of error where users would return arrays of the wrong size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes --> changed
the float[]
--> float[]
(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to
The signature of
Agent.Heuristic()
was changed to take afloat[]
as a parameter...
(and in the changelog)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. We will follow up with the array allocations later.
Proposed change(s)
Changes the signature of Agent.Heuristic() to take a
float[]
instead of returning one. This eliminate a common source of user error, where the returned array was the wrong size.Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)
https://jira.unity3d.com/browse/MLA-768
#3719
Types of change(s)
Checklist
Other comments