Skip to content

Hotfixes for Release 0.15.1 #3698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Mar 30, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,5 @@ disable =
# Appears to be https://github.com/PyCQA/pylint/issues/2981
W0201,

# Using the global statement
W0603,
6 changes: 4 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,9 @@ COPY ml-agents /ml-agents
WORKDIR /ml-agents
RUN pip install -e .

# port 5005 is the port used in in Editor training.
EXPOSE 5005
# Port 5004 is the port used in in Editor training.
# Environments will start from port 5005,
# so allow enough ports for several environments.
EXPOSE 5004-5050

ENTRYPOINT ["mlagents-learn"]
Original file line number Diff line number Diff line change
Expand Up @@ -1690,8 +1690,8 @@ MonoBehaviour:
m_InferenceDevice: 0
m_BehaviorType: 0
m_BehaviorName: CrawlerStatic
m_TeamID: 0
m_useChildSensors: 1
TeamId: 0
m_UseChildSensors: 1
--- !u!114 &114230237520033992
MonoBehaviour:
m_ObjectHideFlags: 0
Expand All @@ -1704,6 +1704,9 @@ MonoBehaviour:
m_Script: {fileID: 11500000, guid: 2f37c30a5e8d04117947188818902ef3, type: 3}
m_Name:
m_EditorClassIdentifier:
agentParameters:
maxStep: 0
hasUpgradedFromAgentParameters: 1
maxStep: 5000
target: {fileID: 4749909135913778}
ground: {fileID: 4856650706546504}
Expand Down Expand Up @@ -1759,7 +1762,7 @@ MonoBehaviour:
m_Name:
m_EditorClassIdentifier:
DecisionPeriod: 5
RepeatAction: 0
TakeActionsBetweenDecisions: 0
offsetStep: 0
--- !u!1 &1492926997393242
GameObject:
Expand Down Expand Up @@ -2959,8 +2962,8 @@ Transform:
m_PrefabAsset: {fileID: 0}
m_GameObject: {fileID: 1995322274649904}
m_LocalRotation: {x: 0, y: -0, z: -0, w: 1}
m_LocalPosition: {x: -0, y: 0.5, z: 0}
m_LocalScale: {x: 0.01, y: 0.01, z: 0.01}
m_LocalPosition: {x: -0, y: 1.5, z: 0}
m_LocalScale: {x: 0.01, y: 0.03, z: 0.01}
m_Children: []
m_Father: {fileID: 4924174722017668}
m_RootOrder: 1
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ developer communities.
* Train using concurrent Unity environment instances

## Releases & Documentation
**Our latest, stable release is 0.15.0. Click
**Our latest, stable release is 0.15.1. Click
[here](docs/Readme.md) to
get started with the latest release of ML-Agents.**

Expand All @@ -61,6 +61,7 @@ details of the changes between versions.

| **Version** | **Release Date** | **Source** | **Documentation** | **Download** |
|:-------:|:------:|:-------------:|:-------:|:------------:|
| **0.15.0** | March 18, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.15.0) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.15.0/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.15.0.zip) |
| **0.14.1** | February 26, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.14.1) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.14.1/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.14.1.zip) |
| **0.14.0** | February 13, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.14.0) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.14.0/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.14.0.zip) |
| **0.13.1** | January 21, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.13.1) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.13.1/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.13.1.zip) |
Expand Down
11 changes: 11 additions & 0 deletions com.unity.ml-agents/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,17 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).


## [0.15.1-preview] - 2020-03-30
### Bug Fixes
- Raise the wall in CrawlerStatic scene to prevent Agent from falling off. (#3650)
- Fixed an issue where specifying `vis_encode_type` was required only for SAC. (#3677)
- Fixed the reported entropy values for continuous actions (#3684)
- Fixed an issue where switching models using `SetModel()` during training would use an excessive amount of memory. (#3664)
- Environment subprocesses now close immediately on timeout or wrong API version. (#3679)
- Fixed an issue in the gym wrapper that would raise an exception if an Agent called EndEpisode multiple times in the same step. (#3700)
- Fixed an issue where logging output was not visible; logging levels are now set consistently (#3703).


## [0.15.0-preview] - 2020-03-18
### Major Changes
- `Agent.CollectObservations` now takes a VectorSensor argument. (#3352, #3389)
Expand Down
2 changes: 1 addition & 1 deletion com.unity.ml-agents/Runtime/Academy.cs
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ public class Academy : IDisposable
/// Unity package version of com.unity.ml-agents.
/// This must match the version string in package.json and is checked in a unit test.
/// </summary>
internal const string k_PackageVersion = "0.15.0-preview";
internal const string k_PackageVersion = "0.15.1-preview";

const int k_EditorTrainingPort = 5004;

Expand Down
3 changes: 2 additions & 1 deletion com.unity.ml-agents/Runtime/Agent.cs
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,7 @@ protected virtual void OnDisable()

void NotifyAgentDone(DoneReason doneReason)
{
m_Info.episodeId = m_EpisodeId;
m_Info.reward = m_Reward;
m_Info.done = true;
m_Info.maxStepReached = doneReason == DoneReason.MaxStepReached;
Expand Down Expand Up @@ -376,7 +377,7 @@ public void SetModel(
// If everything is the same, don't make any changes.
return;
}

NotifyAgentDone(DoneReason.Disabled);
m_PolicyFactory.model = model;
m_PolicyFactory.inferenceDevice = inferenceDevice;
m_PolicyFactory.behaviorName = behaviorName;
Expand Down
17 changes: 12 additions & 5 deletions com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs
Original file line number Diff line number Diff line change
Expand Up @@ -458,13 +458,20 @@ UnityRLInitializationOutputProto GetTempUnityRlInitializationOutput()
{
if (m_CurrentUnityRlOutput.AgentInfos.ContainsKey(behaviorName))
{
if (output == null)
if (m_CurrentUnityRlOutput.AgentInfos[behaviorName].CalculateSize() > 0)
{
output = new UnityRLInitializationOutputProto();
}
// Only send the BrainParameters if there is a non empty list of
// AgentInfos ready to be sent.
// This is to ensure that The Python side will always have a first
// observation when receiving the BrainParameters
if (output == null)
{
output = new UnityRLInitializationOutputProto();
}

var brainParameters = m_UnsentBrainKeys[behaviorName];
output.BrainParameters.Add(brainParameters.ToProto(behaviorName, true));
var brainParameters = m_UnsentBrainKeys[behaviorName];
output.BrainParameters.Add(brainParameters.ToProto(behaviorName, true));
}
}
}

Expand Down
5 changes: 4 additions & 1 deletion com.unity.ml-agents/Runtime/Policies/HeuristicPolicy.cs
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,10 @@ public HeuristicPolicy(Func<float[]> heuristic)
public void RequestDecision(AgentInfo info, List<ISensor> sensors)
{
StepSensors(sensors);
m_LastDecision = m_Heuristic.Invoke();
if (!info.done)
{
m_LastDecision = m_Heuristic.Invoke();
}
}

/// <inheritdoc />
Expand Down
2 changes: 1 addition & 1 deletion com.unity.ml-agents/package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "com.unity.ml-agents",
"displayName": "ML Agents",
"version": "0.15.0-preview",
"version": "0.15.1-preview",
"unity": "2018.4",
"description": "Add interactivity to your game with Machine Learning Agents trained using Deep Reinforcement Learning.",
"dependencies": {
Expand Down
4 changes: 3 additions & 1 deletion docs/Migrating.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ The versions can be found in
* The interface for SideChannels was changed:
* In C#, `OnMessageReceived` now takes a `IncomingMessage` argument, and `QueueMessageToSend` takes an `OutgoingMessage` argument.
* In python, `on_message_received` now takes a `IncomingMessage` argument, and `queue_message_to_send` takes an `OutgoingMessage` argument.
* Automatic stepping for Academy is now controlled from the AutomaticSteppingEnabled property.

### Steps to Migrate
* Add the `using MLAgents.Sensors;` in addition to `using MLAgents;` on top of your Agent's script.
Expand All @@ -45,11 +46,12 @@ The versions can be found in
* We strongly recommend replacing the following methods with their new equivalent as they will be removed in a later release:
* `InitializeAgent()` to `Initialize()`
* `AgentAction()` to `OnActionReceived()`
* `AgentReset()` to `OnEpsiodeBegin()`
* `AgentReset()` to `OnEpisodeBegin()`
* `Done()` to `EndEpisode()`
* `GiveModel()` to `SetModel()`
* Replace `IFloatProperties` variables with `FloatPropertiesChannel` variables.
* If you implemented custom `SideChannels`, update the signatures of your methods, and add your data to the `OutgoingMessage` or read it from the `IncomingMessage`.
* Replace calls to Academy.EnableAutomaticStepping()/DisableAutomaticStepping() with Academy.AutomaticSteppingEnabled = true/false.

## Migrating from 0.13 to 0.14

Expand Down
2 changes: 1 addition & 1 deletion gym-unity/gym_unity/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.15.0"
__version__ = "0.15.1"
32 changes: 21 additions & 11 deletions gym-unity/gym_unity/envs/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
import logging
import itertools
import numpy as np
from typing import Any, Dict, List, Optional, Tuple, Union
Expand All @@ -8,6 +7,7 @@

from mlagents_envs.environment import UnityEnvironment
from mlagents_envs.base_env import BatchedStepResult
from mlagents_envs import logging_util


class UnityGymException(error.Error):
Expand All @@ -18,9 +18,8 @@ class UnityGymException(error.Error):
pass


logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("gym_unity")

logger = logging_util.get_logger(__name__)
logging_util.set_log_level(logging_util.INFO)

GymSingleStepResult = Tuple[np.ndarray, float, bool, Dict]
GymMultiStepResult = Tuple[List[np.ndarray], List[float], List[bool], Dict]
Expand Down Expand Up @@ -364,9 +363,8 @@ def _check_agents(self, n_agents: int) -> None:

def _sanitize_info(self, step_result: BatchedStepResult) -> BatchedStepResult:
n_extra_agents = step_result.n_agents() - self._n_agents
if n_extra_agents < 0 or n_extra_agents > self._n_agents:
if n_extra_agents < 0:
# In this case, some Agents did not request a decision when expected
# or too many requested a decision
raise UnityGymException(
"The number of agents in the scene does not match the expected number."
)
Expand All @@ -386,6 +384,10 @@ def _sanitize_info(self, step_result: BatchedStepResult) -> BatchedStepResult:
# only cares about the ordering.
for index, agent_id in enumerate(step_result.agent_id):
if not self._previous_step_result.contains_agent(agent_id):
if step_result.done[index]:
# If the Agent is already done (e.g. it ended its epsiode twice in one step)
# Don't try to register it here.
continue
# Register this agent, and get the reward of the previous agent that
# was in its index, so that we can return it to the gym.
last_reward = self.agent_mapper.register_new_agent_id(agent_id)
Expand Down Expand Up @@ -528,8 +530,12 @@ def mark_agent_done(self, agent_id: int, reward: float) -> None:
"""
Declare the agent done with the corresponding final reward.
"""
gym_index = self._agent_id_to_gym_index.pop(agent_id)
self._done_agents_index_to_last_reward[gym_index] = reward
if agent_id in self._agent_id_to_gym_index:
gym_index = self._agent_id_to_gym_index.pop(agent_id)
self._done_agents_index_to_last_reward[gym_index] = reward
else:
# Agent was never registered in the first place (e.g. EndEpisode called multiple times)
pass

def register_new_agent_id(self, agent_id: int) -> float:
"""
Expand Down Expand Up @@ -581,9 +587,13 @@ def set_initial_agents(self, agent_ids: List[int]) -> None:
self._gym_id_order = list(agent_ids)

def mark_agent_done(self, agent_id: int, reward: float) -> None:
gym_index = self._gym_id_order.index(agent_id)
self._done_agents_index_to_last_reward[gym_index] = reward
self._gym_id_order[gym_index] = -1
try:
gym_index = self._gym_id_order.index(agent_id)
self._done_agents_index_to_last_reward[gym_index] = reward
self._gym_id_order[gym_index] = -1
except ValueError:
# Agent was never registered in the first place (e.g. EndEpisode called multiple times)
pass

def register_new_agent_id(self, agent_id: int) -> float:
original_index = self._gym_id_order.index(-1)
Expand Down
48 changes: 48 additions & 0 deletions gym-unity/gym_unity/tests/test_gym.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,50 @@ def test_sanitize_action_one_agent_done(mock_env):
assert expected_agent_id == agent_id


@mock.patch("gym_unity.envs.UnityEnvironment")
def test_sanitize_action_new_agent_done(mock_env):
mock_spec = create_mock_group_spec(
vector_action_space_type="discrete", vector_action_space_size=[2, 2, 3]
)
mock_step = create_mock_vector_step_result(num_agents=3)
mock_step.agent_id = np.array(range(5))
setup_mock_unityenvironment(mock_env, mock_spec, mock_step)
env = UnityEnv(" ", use_visual=False, multiagent=True)

received_step_result = create_mock_vector_step_result(num_agents=7)
received_step_result.agent_id = np.array(range(7))
# agent #3 (id = 2) is Done
# so is the "new" agent (id = 5)
done = [False] * 7
done[2] = True
done[5] = True
received_step_result.done = np.array(done)
sanitized_result = env._sanitize_info(received_step_result)
for expected_agent_id, agent_id in zip([0, 1, 6, 3, 4], sanitized_result.agent_id):
assert expected_agent_id == agent_id


@mock.patch("gym_unity.envs.UnityEnvironment")
def test_sanitize_action_single_agent_multiple_done(mock_env):
mock_spec = create_mock_group_spec(
vector_action_space_type="discrete", vector_action_space_size=[2, 2, 3]
)
mock_step = create_mock_vector_step_result(num_agents=1)
mock_step.agent_id = np.array(range(1))
setup_mock_unityenvironment(mock_env, mock_spec, mock_step)
env = UnityEnv(" ", use_visual=False, multiagent=False)

received_step_result = create_mock_vector_step_result(num_agents=3)
received_step_result.agent_id = np.array(range(3))
# original agent (id = 0) is Done
# so is the "new" agent (id = 1)
done = [True, True, False]
received_step_result.done = np.array(done)
sanitized_result = env._sanitize_info(received_step_result)
for expected_agent_id, agent_id in zip([2], sanitized_result.agent_id):
assert expected_agent_id == agent_id


# Helper methods


Expand Down Expand Up @@ -200,6 +244,10 @@ def test_agent_id_index_mapper(mapper_cls):
mapper.mark_agent_done(1001, 42.0)
mapper.mark_agent_done(1004, 1337.0)

# Make sure we can handle an unknown agent id being marked done.
# This can happen when an agent ends an episode on the same step it starts.
mapper.mark_agent_done(9999, -1.0)

# Now add new agents, and get the rewards of the agent they replaced.
old_reward1 = mapper.register_new_agent_id(2001)
old_reward2 = mapper.register_new_agent_id(2002)
Expand Down
2 changes: 1 addition & 1 deletion ml-agents-envs/mlagents_envs/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.15.0"
__version__ = "0.15.1"
Loading