Skip to content

Commit 4058e95

Browse files
author
Ervin T
authored
Split Policy and Optimizer, common Policy for PPO and SAC (#3345)
1 parent b4e8ba1 commit 4058e95

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+2932
-3586
lines changed

com.unity.ml-agents/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
99
- Agent.CollectObservations now takes a VectorSensor argument. It was also overloaded to optionally take an ActionMasker argument. (#3352, #3389)
1010
- Beta support for ONNX export was added. If the `tf2onnx` python package is installed, models will be saved to `.onnx` as well as `.nn` format.
1111
Note that Barracuda 0.6.0 or later is required to import the `.onnx` files properly
12+
- Multi-GPU training and the `--multi-gpu` option has been removed temporarily. (#3345)
1213

1314
### Minor Changes
1415
- Monitor.cs was moved to Examples. (#3372)

config/sac_trainer_config.yaml

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ default:
88
learning_rate: 3.0e-4
99
learning_rate_schedule: constant
1010
max_steps: 5.0e5
11-
memory_size: 256
11+
memory_size: 128
1212
normalize: false
1313
num_update: 1
1414
train_interval: 1
@@ -214,7 +214,7 @@ Hallway:
214214
sequence_length: 32
215215
num_layers: 2
216216
hidden_units: 128
217-
memory_size: 256
217+
memory_size: 128
218218
init_entcoef: 0.1
219219
max_steps: 1.0e7
220220
summary_freq: 10000
@@ -225,10 +225,11 @@ VisualHallway:
225225
sequence_length: 32
226226
num_layers: 1
227227
hidden_units: 128
228-
memory_size: 256
228+
memory_size: 128
229229
gamma: 0.99
230230
batch_size: 64
231231
max_steps: 1.0e7
232+
summary_freq: 10000
232233
time_horizon: 64
233234
use_recurrent: true
234235

@@ -237,7 +238,7 @@ VisualPushBlock:
237238
sequence_length: 32
238239
num_layers: 1
239240
hidden_units: 128
240-
memory_size: 256
241+
memory_size: 128
241242
gamma: 0.99
242243
buffer_size: 1024
243244
batch_size: 64

config/trainer_config.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ default:
99
learning_rate: 3.0e-4
1010
learning_rate_schedule: linear
1111
max_steps: 5.0e5
12-
memory_size: 256
12+
memory_size: 128
1313
normalize: false
1414
num_epoch: 3
1515
num_layers: 2
@@ -219,7 +219,7 @@ Hallway:
219219
sequence_length: 64
220220
num_layers: 2
221221
hidden_units: 128
222-
memory_size: 256
222+
memory_size: 128
223223
beta: 1.0e-2
224224
num_epoch: 3
225225
buffer_size: 1024
@@ -233,7 +233,7 @@ VisualHallway:
233233
sequence_length: 64
234234
num_layers: 1
235235
hidden_units: 128
236-
memory_size: 256
236+
memory_size: 128
237237
beta: 1.0e-2
238238
num_epoch: 3
239239
buffer_size: 1024
@@ -247,7 +247,7 @@ VisualPushBlock:
247247
sequence_length: 32
248248
num_layers: 1
249249
hidden_units: 128
250-
memory_size: 256
250+
memory_size: 128
251251
beta: 1.0e-2
252252
num_epoch: 3
253253
buffer_size: 1024

docs/Migrating.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ The versions can be found in
1717
* The interface for `RayPerceptionSensor.PerceiveStatic()` was changed to take an input class and write to an output class.
1818
* The `SetActionMask` method must now be called on the optional `ActionMasker` argument of the `CollectObservations` method. (We now consider an action mask as a type of observation)
1919
* The method `GetStepCount()` on the Agent class has been replaced with the property getter `StepCount`
20+
* The `--multi-gpu` option has been removed temporarily.
2021

2122
### Steps to Migrate
2223
* Replace your Agent's implementation of `CollectObservations()` with `CollectObservations(VectorSensor sensor)`. In addition, replace all calls to `AddVectorObs()` with `sensor.AddObservation()` or `sensor.AddOneHotObservation()` on the `VectorSensor` passed as argument.

docs/Training-ML-Agents.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,6 @@ environment, you can set the following command line options when invoking
151151
[here](https://docs.unity3d.com/Manual/CommandLineArguments.html) for more
152152
details.
153153
* `--debug`: Specify this option to enable debug-level logging for some parts of the code.
154-
* `--multi-gpu`: Setting this flag enables the use of multiple GPU's (if available) during training.
155154
* `--cpu`: Forces training using CPU only.
156155
* Engine Configuration :
157156
* `--width' : The width of the executable window of the environment(s) in pixels

docs/Training-PPO.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -218,11 +218,11 @@ Typical Range: `4` - `128`
218218
### Memory Size
219219

220220
`memory_size` corresponds to the size of the array of floating point numbers
221-
used to store the hidden state of the recurrent neural network. This value must
222-
be a multiple of 4, and should scale with the amount of information you expect
221+
used to store the hidden state of the recurrent neural network of the policy. This value must
222+
be a multiple of 2, and should scale with the amount of information you expect
223223
the agent will need to remember in order to successfully complete the task.
224224

225-
Typical Range: `64` - `512`
225+
Typical Range: `32` - `256`
226226

227227
## (Optional) Behavioral Cloning Using Demonstrations
228228

docs/Training-SAC.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -223,11 +223,11 @@ Typical Range: `4` - `128`
223223
### Memory Size
224224

225225
`memory_size` corresponds to the size of the array of floating point numbers
226-
used to store the hidden state of the recurrent neural network. This value must
227-
be a multiple of 4, and should scale with the amount of information you expect
226+
used to store the hidden state of the recurrent neural network in the policy.
227+
This value must be a multiple of 2, and should scale with the amount of information you expect
228228
the agent will need to remember in order to successfully complete the task.
229229

230-
Typical Range: `64` - `512`
230+
Typical Range: `32` - `256`
231231

232232
### (Optional) Save Replay Buffer
233233

ml-agents/mlagents/trainers/agent_processor.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -65,9 +65,6 @@ def add_experiences(
6565
if take_action_outputs:
6666
for _entropy in take_action_outputs["entropy"]:
6767
self.stats_reporter.add_stat("Policy/Entropy", _entropy)
68-
self.stats_reporter.add_stat(
69-
"Policy/Learning Rate", take_action_outputs["learning_rate"]
70-
)
7168

7269
terminated_agents: Set[str] = set()
7370
# Make unique agent_ids that are global across workers

ml-agents/mlagents/trainers/common/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)