Skip to content

Commit 5e60954

Browse files
author
Ervin T
authored
[CI] Better hyperparameters for Pyramids-SAC, WalkerStatic-SAC, and Reacher-PPO (#4154)
1 parent ddfe054 commit 5e60954

File tree

3 files changed

+15
-15
lines changed

3 files changed

+15
-15
lines changed

config/ppo/Reacher.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@ behaviors:
22
Reacher:
33
trainer_type: ppo
44
hyperparameters:
5-
batch_size: 2024
6-
buffer_size: 20240
5+
batch_size: 512
6+
buffer_size: 20480
77
learning_rate: 0.0003
8-
beta: 0.005
8+
beta: 0.001
99
epsilon: 0.2
1010
lambd: 0.95
1111
num_epoch: 3

config/sac/Pyramids.yaml

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,32 +5,32 @@ behaviors:
55
learning_rate: 0.0003
66
learning_rate_schedule: constant
77
batch_size: 128
8-
buffer_size: 500000
9-
buffer_init_steps: 10000
8+
buffer_size: 2000000
9+
buffer_init_steps: 1000
1010
tau: 0.01
1111
steps_per_update: 10.0
1212
save_replay_buffer: false
1313
init_entcoef: 0.01
1414
reward_signal_steps_per_update: 10.0
1515
network_settings:
1616
normalize: false
17-
hidden_units: 256
18-
num_layers: 2
17+
hidden_units: 512
18+
num_layers: 3
1919
vis_encode_type: simple
2020
reward_signals:
2121
extrinsic:
22-
gamma: 0.99
22+
gamma: 0.995
2323
strength: 2.0
2424
gail:
2525
gamma: 0.99
26-
strength: 0.02
26+
strength: 0.01
2727
encoding_size: 128
2828
learning_rate: 0.0003
2929
use_actions: true
3030
use_vail: false
3131
demo_path: Project/Assets/ML-Agents/Examples/Pyramids/Demos/ExpertPyramid.demo
3232
keep_checkpoints: 5
33-
max_steps: 10000000
33+
max_steps: 3000000
3434
time_horizon: 128
3535
summary_freq: 30000
3636
threaded: true

config/sac/WalkerStatic.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ behaviors:
44
hyperparameters:
55
learning_rate: 0.0003
66
learning_rate_schedule: constant
7-
batch_size: 256
8-
buffer_size: 500000
7+
batch_size: 1024
8+
buffer_size: 2000000
99
buffer_init_steps: 0
1010
tau: 0.005
1111
steps_per_update: 30.0
@@ -14,15 +14,15 @@ behaviors:
1414
reward_signal_steps_per_update: 30.0
1515
network_settings:
1616
normalize: true
17-
hidden_units: 512
18-
num_layers: 4
17+
hidden_units: 256
18+
num_layers: 3
1919
vis_encode_type: simple
2020
reward_signals:
2121
extrinsic:
2222
gamma: 0.995
2323
strength: 1.0
2424
keep_checkpoints: 5
25-
max_steps: 20000000
25+
max_steps: 15000000
2626
time_horizon: 1000
2727
summary_freq: 30000
2828
threaded: true

0 commit comments

Comments
 (0)