forked from lazyprogrammer/machine_learning_examples
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathextra_reading.txt
22 lines (16 loc) · 851 Bytes
/
extra_reading.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
How do I sample from a discrete (categorical) distribution in log space?
https://stats.stackexchange.com/questions/64081/how-do-i-sample-from-a-discrete-categorical-distribution-in-log-space
A2C (Advantage Actor-Critic)
https://openai.com/blog/baselines-acktr-a2c/
DDPG (Deep Deterministic Policy Gradient)
"Continuous control with deep reinforcement learning"
https://arxiv.org/abs/1509.02971
Deterministic Policy Gradient Algorithms
http://proceedings.mlr.press/v32/silver14.pdf
ES (Evolution Strategies)
"Evolution Strategies as a Scalable Alternative to Reinforcement Learning"
https://arxiv.org/abs/1703.03864
Trust Region Evolution Strategies
https://www.microsoft.com/en-us/research/uploads/prod/2018/11/trust-region-evolution-strategies.pdf
Addressing Function Approximation Error in Actor-Critic Methods
https://arxiv.org/abs/1802.09477