-
Notifications
You must be signed in to change notification settings - Fork 63
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add R2D1 DQNAgent Change grandiosely used functions Fix zero-padding & torch contiguous Fix zero-padding & Change indices sampling function Change hyperparameters Remove redundant codes Add CNN compatibility to R2D1Agent Remove redundant code Implement rlpyt forward style Add previous_action & previous_reward GRU input structure Fix error Fix prev_action bug & Use make_one_hot function Fix error Update descriptions & move leading_dims functions to helper_functions.py Move valid_from_done from R2D1Loss to helper_functions.py Fix parameters r2d1_iqn loss & agent Fix GRUBrain compatible with c51 Add R2D1C51Loss Add r2d1_c51 configs Fix priority > 0 assert error Change parameters * Fix descriptions & Rename R2D1 to R2D1DQN Change parameters Add total_step to wandb log Add upndown env & configs Fix test score Fix test score sum to mean Add total step to recurrent dqn_agent Fix test log position Add framestack argument Remove upndown environment Fix no_framestack argument Add r2d1 resnet configs Delete lunarlander iqn & Fix R2D1C51 lunarlander config description Fix configs Change total_step count startpoint after warmup Chage test startpoint Fix epsilon decay Change r2d1 agent epsilon_decay Fix several issues commented * Rebase commit to a302479 * Rebase commit to 990c78a * Fix several issues commented * Delete PrioritizedRecurrentReplayBuffer * Resolved issues commented * Fix issues commented * Change R2D1Learner parent class & Add descriptions * Fix issues commented * Change descriptions * Delete unnecessary configs * Change descriptions due to the length limit * Fix several issues commented * Fix contiguous issues * Add __init__.py in recurrent * Rebase commit to 815a1ca * Use no grad tensor to select action * Modify documentation * Add torch.no_grad to select action in other class * Add R2D1DQN ResNet config * Fix R2D1DQN ResNet config * Merge recurrent_replay_buffer into replay_bufer * Add R2D1 on readme * Remove off-framestack explanation * Change r2d1 configs' framestack 1 to 4 Co-authored-by: khkim <kh.kim@medipixel.io>
- Loading branch information
1 parent
815a1ca
commit 07743f6
Showing
32 changed files
with
1,651 additions
and
78 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
"""Config for R2D1 on LunarLander-v2. | ||
- Author: Euijin Jeong | ||
- Contact: euijin.jeong@medipixel.io | ||
""" | ||
from rl_algorithms.common.helper_functions import identity | ||
|
||
agent = dict( | ||
type="R2D1Agent", | ||
hyper_params=dict( | ||
gamma=0.99, | ||
tau=5e-3, | ||
buffer_size=int(1e4), # openai baselines: int(1e4) | ||
batch_size=64, # openai baselines: 32 | ||
update_starts_from=int(1e3), # openai baselines: int(1e4) | ||
multiple_update=1, # multiple learning updates | ||
train_freq=1, # in openai baselines, train_freq = 4 | ||
gradient_clip=10.0, # dueling: 10.0 | ||
n_step=3, | ||
w_n_step=1.0, | ||
w_q_reg=0.0, | ||
per_alpha=0.6, # openai baselines: 0.6 | ||
per_beta=0.4, | ||
per_eps=1e-6, | ||
# R2D1 | ||
sequence_size=32, | ||
overlap_size=16, | ||
loss_type=dict(type="R2D1C51Loss"), | ||
# Epsilon Greedy | ||
max_epsilon=1.0, | ||
min_epsilon=0.01, # openai baselines: 0.01 | ||
epsilon_decay=2e-5, # openai baselines: 1e-7 / 1e-1 | ||
), | ||
learner_cfg=dict( | ||
type="R2D1Learner", | ||
backbone=dict(), | ||
gru=dict(rnn_hidden_size=64, burn_in_step=16,), | ||
head=dict( | ||
type="C51DuelingMLP", | ||
configs=dict( | ||
hidden_sizes=[128, 64], | ||
v_min=-300, | ||
v_max=300, | ||
atom_size=51, | ||
output_activation=identity, | ||
# NoisyNet | ||
use_noisy_net=False, | ||
), | ||
), | ||
optim_cfg=dict(lr_dqn=1e-4, weight_decay=1e-7, adam_eps=1e-8), | ||
), | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
"""Config for R2D1 on PongNoFrameskip-v4. | ||
- Author: Euijin Jeong | ||
- Contact: euijin.jeong@medipixel.io | ||
""" | ||
from rl_algorithms.common.helper_functions import identity | ||
|
||
agent = dict( | ||
type="R2D1Agent", | ||
hyper_params=dict( | ||
gamma=0.99, | ||
tau=5e-3, | ||
buffer_size=int(4e3), # openai baselines: int(1e4) | ||
batch_size=32, # openai baselines: 32 | ||
update_starts_from=int(4e3), # openai baselines: int(1e4) | ||
multiple_update=1, # multiple learning updates | ||
train_freq=4, # in openai baselines, train_freq = 4 | ||
gradient_clip=10.0, # dueling: 10.0 | ||
n_step=5, | ||
w_n_step=1.0, | ||
w_q_reg=0.0, | ||
per_alpha=0.6, # openai baselines: 0.6 | ||
per_beta=0.4, | ||
per_eps=1e-6, | ||
# R2D1 | ||
sequence_size=20, | ||
overlap_size=10, | ||
loss_type=dict(type="R2D1DQNLoss"), | ||
# Epsilon Greedy | ||
max_epsilon=1.0, | ||
min_epsilon=0.01, # openai baselines: 0.01 | ||
epsilon_decay=3e-6, # openai baselines: 1e-7 / 1e-1 | ||
# grad_cam | ||
grad_cam_layer_list=[ | ||
"backbone.cnn.cnn_0.cnn", | ||
"backbone.cnn.cnn_1.cnn", | ||
"backbone.cnn.cnn_2.cnn", | ||
], | ||
), | ||
learner_cfg=dict( | ||
type="R2D1Learner", | ||
backbone=dict( | ||
type="CNN", | ||
configs=dict( | ||
input_sizes=[4, 32, 64], | ||
output_sizes=[32, 64, 64], | ||
kernel_sizes=[8, 4, 3], | ||
strides=[4, 2, 1], | ||
paddings=[1, 0, 0], | ||
), | ||
), | ||
gru=dict(rnn_hidden_size=512, burn_in_step=10,), | ||
head=dict( | ||
type="DuelingMLP", | ||
configs=dict( | ||
hidden_sizes=[512], use_noisy_net=False, output_activation=identity, | ||
), | ||
), | ||
optim_cfg=dict( | ||
lr_dqn=1e-4, # dueling: 6.25e-5, openai baselines: 1e-4 | ||
weight_decay=0.0, # this makes saturation in cnn weights | ||
adam_eps=1e-8, # rainbow: 1.5e-4, openai baselines: 1e-8 | ||
), | ||
), | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
"""Config for R2D1DQN on PongNoFrameskip-v4. | ||
- Author: Kyunghwan Kim, Euijin Jeong | ||
- Contact: kh.kim@medipixel.io, euijin.jeong@medipixel.io | ||
""" | ||
from rl_algorithms.common.helper_functions import identity | ||
|
||
agent = dict( | ||
type="R2D1Agent", | ||
hyper_params=dict( | ||
gamma=0.99, | ||
tau=5e-3, | ||
buffer_size=int(4e3), # openai baselines: int(1e4) | ||
batch_size=16, # openai baselines: 32 | ||
update_starts_from=int(4e3), # openai baselines: int(1e4) | ||
multiple_update=1, # multiple learning updates | ||
train_freq=4, # in openai baselines, train_freq = 4 | ||
gradient_clip=10.0, # dueling: 10.0 | ||
n_step=5, | ||
w_n_step=1.0, | ||
w_q_reg=0.0, | ||
per_alpha=0.6, # openai baselines: 0.6 | ||
per_beta=0.4, | ||
per_eps=1e-6, | ||
# R2D1 | ||
sequence_size=20, | ||
overlap_size=10, | ||
loss_type=dict(type="R2D1DQNLoss"), | ||
# Epsilon Greedy | ||
max_epsilon=1.0, | ||
min_epsilon=0.01, # openai baselines: 0.01 | ||
epsilon_decay=3e-6, # openai baselines: 1e-7 / 1e-1 | ||
# grad_cam | ||
grad_cam_layer_list=[ | ||
"backbone.layer1.0.conv2", | ||
"backbone.layer2.0.shortcut.0", | ||
"backbone.layer3.0.shortcut.0", | ||
"backbone.layer4.0.shortcut.0", | ||
"backbone.conv_out", | ||
], | ||
), | ||
learner_cfg=dict( | ||
type="R2D1Learner", | ||
backbone=dict( | ||
type="ResNet", | ||
configs=dict( | ||
use_bottleneck=False, | ||
num_blocks=[1, 1, 1, 1], | ||
block_output_sizes=[32, 32, 64, 64], | ||
block_strides=[1, 2, 2, 2], | ||
first_input_size=4, | ||
first_output_size=32, | ||
expansion=1, | ||
channel_compression=4, # compression ratio | ||
), | ||
), | ||
gru=dict(rnn_hidden_size=512, burn_in_step=10,), | ||
head=dict( | ||
type="DuelingMLP", | ||
configs=dict( | ||
hidden_sizes=[512], use_noisy_net=False, output_activation=identity, | ||
), | ||
), | ||
optim_cfg=dict( | ||
lr_dqn=1e-4, # dueling: 6.25e-5, openai baselines: 1e-4 | ||
weight_decay=0.0, # this makes saturation in cnn weights | ||
adam_eps=1e-8, # rainbow: 1.5e-4, openai baselines: 1e-8 | ||
), | ||
), | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.