Skip to content

Commit eeb13c8

Browse files
added hyperparamters used for training
1 parent 5c71296 commit eeb13c8

File tree

1 file changed

+176
-0
lines changed

1 file changed

+176
-0
lines changed

PPO_preTrained/README.md

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
## Hyperparameters
2+
3+
Hyperparameters used to obtain the `preTrained` networks are listed below :
4+
5+
### RoboschoolWalker2d-v1
6+
7+
```
8+
####### initialize environment hyperparameters ######
9+
10+
env_name = "RoboschoolWalker2d-v1"
11+
12+
has_continuous_action_space = True
13+
14+
max_ep_len = 1000 # max timesteps in one episode
15+
max_training_timesteps = int(3e6) # break training loop if timeteps > max_training_timesteps
16+
17+
print_freq = max_ep_len * 10 # print avg reward in the interval (in num timesteps)
18+
log_freq = max_ep_len * 2 # log avg reward in the interval (in num timesteps)
19+
save_model_freq = int(1e5) # save model frequency (in num timesteps)
20+
21+
action_std = 0.6 # starting std for action distribution (Multivariate Normal)
22+
action_std_decay_rate = 0.05 # linearly decay action_std (action_std = action_std - action_std_decay_rate)
23+
min_action_std = 0.1 # minimum action_std (stop decay after action_std <= min_action_std)
24+
action_std_decay_freq = int(2.5e5) # action_std decay frequency (in num timesteps)
25+
26+
#####################################################
27+
28+
29+
## Note : print/log frequencies should be > than max_ep_len
30+
31+
32+
################ PPO hyperparameters ################
33+
34+
update_timestep = max_ep_len * 4 # update policy every n timesteps
35+
K_epochs = 80 # update policy for K epochs in one PPO update
36+
37+
eps_clip = 0.2 # clip parameter for PPO
38+
gamma = 0.99 # discount factor
39+
40+
lr_actor = 0.0003 # learning rate for actor network
41+
lr_critic = 0.001 # learning rate for critic network
42+
43+
random_seed = 0 # set random seed if required (0 = no random seed)
44+
45+
#####################################################
46+
```
47+
48+
49+
### BipedalWalker-v2
50+
51+
```
52+
####### initialize environment hyperparameters ######
53+
54+
env_name = "BipedalWalker-v2"
55+
56+
has_continuous_action_space = True
57+
58+
max_ep_len = 1500 # max timesteps in one episode
59+
max_training_timesteps = int(3e6) # break training loop if timeteps > max_training_timesteps
60+
61+
print_freq = max_ep_len * 4 # print avg reward in the interval (in num timesteps)
62+
log_freq = max_ep_len * 2 # log avg reward in the interval (in num timesteps)
63+
save_model_freq = int(1e5) # save model frequency (in num timesteps)
64+
65+
action_std = 0.6 # starting std for action distribution (Multivariate Normal)
66+
action_std_decay_rate = 0.05 # linearly decay action_std (action_std = action_std - action_std_decay_rate)
67+
min_action_std = 0.1 # minimum action_std (stop decay after action_std <= min_action_std)
68+
action_std_decay_freq = int(2.5e5) # action_std decay frequency (in num timesteps)
69+
70+
#####################################################
71+
72+
73+
## Note : print/log frequencies should be > than max_ep_len
74+
75+
76+
################ PPO hyperparameters ################
77+
78+
update_timestep = max_ep_len * 4 # update policy every n timesteps
79+
K_epochs = 80 # update policy for K epochs in one PPO update
80+
81+
eps_clip = 0.2 # clip parameter for PPO
82+
gamma = 0.99 # discount factor
83+
84+
lr_actor = 0.0003 # learning rate for actor network
85+
lr_critic = 0.001 # learning rate for critic network
86+
87+
random_seed = 0 # set random seed if required (0 = no random seed)
88+
89+
#####################################################
90+
```
91+
92+
93+
### Cartpole-v1
94+
95+
```
96+
####### initialize environment hyperparameters ######
97+
98+
env_name = "CartPole-v1"
99+
has_continuous_action_space = False
100+
101+
max_ep_len = 400 # max timesteps in one episode
102+
max_training_timesteps = int(1e5) # break training loop if timeteps > max_training_timesteps
103+
104+
print_freq = max_ep_len * 4 # print avg reward in the interval (in num timesteps)
105+
log_freq = max_ep_len * 2 # log avg reward in the interval (in num timesteps)
106+
save_model_freq = int(2e4) # save model frequency (in num timesteps)
107+
108+
action_std = None
109+
110+
111+
#####################################################
112+
113+
114+
## Note : print/log frequencies should be > than max_ep_len
115+
116+
117+
################ PPO hyperparameters ################
118+
119+
120+
update_timestep = max_ep_len * 4 # update policy every n timesteps
121+
K_epochs = 40 # update policy for K epochs
122+
eps_clip = 0.2 # clip parameter for PPO
123+
gamma = 0.99 # discount factor
124+
125+
lr_actor = 0.0003 # learning rate for actor network
126+
lr_critic = 0.001 # learning rate for critic network
127+
128+
random_seed = 0 # set random seed if required (0 = no random seed)
129+
130+
#####################################################
131+
```
132+
133+
134+
### LunarLander-v2
135+
136+
```
137+
####### initialize environment hyperparameters ######
138+
139+
env_name = "LunarLander-v2"
140+
has_continuous_action_space = False
141+
142+
max_ep_len = 300 # max timesteps in one episode
143+
max_training_timesteps = int(1e6) # break training loop if timeteps > max_training_timesteps
144+
145+
print_freq = max_ep_len * 8 # print avg reward in the interval (in num timesteps)
146+
log_freq = max_ep_len * 2 # log avg reward in the interval (in num timesteps)
147+
save_model_freq = int(5e4) # save model frequency (in num timesteps)
148+
149+
action_std = None
150+
151+
152+
#####################################################
153+
154+
155+
## Note : print/log frequencies should be > than max_ep_len
156+
157+
158+
################ PPO hyperparameters ################
159+
160+
update_timestep = max_ep_len * 3 # update policy every n timesteps
161+
K_epochs = 30 # update policy for K epochs
162+
eps_clip = 0.2 # clip parameter for PPO
163+
gamma = 0.99 # discount factor
164+
165+
lr_actor = 0.0003 # learning rate for actor network
166+
lr_critic = 0.001 # learning rate for critic network
167+
168+
random_seed = 0 # set random seed if required (0 = no random seed)
169+
170+
#####################################################
171+
```
172+
173+
174+
175+
176+

0 commit comments

Comments
 (0)