@@ -15,7 +15,7 @@ So you can run this example in your computer(maybe it take just only 1~2 minitue
15
15
- [x] Distributional(C51) [[ 7]] ( #reference )
16
16
- [x] Rainbow [[ 8]] ( #reference )
17
17
18
- ## PG
18
+ ## PG(Policy Gradient)
19
19
- [x] REINFORCE [[ 9]] ( #reference )
20
20
- [x] Actor Critic [[ 10]] ( #reference )
21
21
- [x] Advantage Actor Critic
@@ -33,12 +33,14 @@ So you can run this example in your computer(maybe it take just only 1~2 minitue
33
33
- [ ] IMPALA [[ 23]] ( #reference )
34
34
- [ ] R2D2 [[ 16]] ( #reference )
35
35
36
- ## Will
37
- - [ ] RND [[ 17]] ( #reference )
38
- - [ ] ICM [[ 22]] ( #refercence )
36
+ ## Distributional DQN
39
37
- [ ] QRDQN [[ 18]] ( #reference )
40
38
- [ ] IQN [[ 19]] ( #reference )
41
39
40
+ ## Exploration
41
+ - [ ] RND [[ 17]] ( #reference )
42
+ - [ ] ICM [[ 22]] ( #refercence )
43
+
42
44
43
45
## Reference
44
46
[ 1] [ Playing Atari with Deep Reinforcement Learning ] (http://arxiv.org/abs/1312.5602 )
@@ -62,7 +64,7 @@ So you can run this example in your computer(maybe it take just only 1~2 minitue
62
64
[ 19] [ Implicit Quantile Networks for Distributional Reinforcement Learning ] (https://arxiv.org/pdf/1806.06923.pdf )
63
65
[ 20] [ A Natural Policy Gradient ] (https://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf )
64
66
[ 21] [ SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY ] (https://arxiv.org/pdf/1611.01224.pdf )
65
- [ 22] [ Curiosity-driven Exploration by Self-supervised Prediction ] (https://arxiv.org/pdf/1705.05363.pdf )
67
+ [ 22] [ Curiosity-driven Exploration by Self-supervised Prediction ] (https://arxiv.org/pdf/1705.05363.pdf )
66
68
[ 23] [ IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures ] (https://arxiv.org/pdf/1802.01561.pdf )
67
69
68
70
0 commit comments