Skip to content

Commit a9cbd44

Browse files
authored
[Algorithm] Td3 (#684)
1 parent a7f24f6 commit a9cbd44

File tree

7 files changed

+925
-7
lines changed

7 files changed

+925
-7
lines changed

examples/td3/config.yaml

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Environment
2+
env_name: HalfCheetah-v4
3+
env_task: ""
4+
exp_name: "debugging"
5+
env_library: gym
6+
record_video: 0
7+
normalize_rewards_online: 0
8+
normalize_rewards_online_scale: 5
9+
normalize_rewards_online_decay: 0.99
10+
total_frames: 1000000
11+
frames_per_batch: 1000
12+
max_frames_per_traj: 1000
13+
frame_skip: 1
14+
from_pixels: 0
15+
seed: 0
16+
17+
# Collection
18+
init_random_frames: 25000
19+
init_env_steps: 10000
20+
record_interval: 10
21+
record_frames: 10000
22+
async_collection: 1
23+
#collector_devices: [cuda:1,cuda:1,cuda:1,cuda:1]
24+
collector_devices: [cpu] # ,cpu,cpu,cpu]
25+
env_per_collector: 1
26+
num_workers: 1
27+
28+
# Replay Buffer
29+
buffer_size: 1000000
30+
31+
# Optimization
32+
utd_ratio: 1.0
33+
gamma: 0.99
34+
loss: double
35+
loss_function: smooth_l1
36+
lr: 3e-4
37+
weight_decay: 0.0
38+
lr_scheduler: ""
39+
optim_steps_per_batch: 128
40+
batch_size: 256
41+
target_update_polyak: 0.995
42+
43+
# Algorithm
44+
prb: 0 # use prioritized experience replay
45+
policy_update_delay: 2
46+
multi_step: 0
47+
n_steps_return: 1
48+
activation: relu
49+
gSDE: 0
50+
51+
# Logging
52+
logger: wandb
53+
54+
# Extra
55+
batch_transform: 1
56+
buffer_prefetch: 64
57+
norm_stats: 1
58+
device: "cpu"

0 commit comments

Comments
 (0)