forked from lazyprogrammer/machine_learning_examples
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathextra_reading.txt
28 lines (20 loc) · 1.06 KB
/
extra_reading.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
The Marginal Value of Adaptive Gradient Methods in Machine Learning
https://arxiv.org/abs/1705.08292
Asynchronous Stochastic Gradient Descent with Delay Compensation for Distributed Deep Learning
https://arxiv.org/abs/1609.08326
Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization
https://arxiv.org/abs/1604.03584
Large Scale Distributed Deep Networks
https://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe, Christian Szegedy
https://arxiv.org/abs/1502.03167
Xavier (Glorot) Normal Initializer
http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf
He Normal Initializer
http://arxiv.org/abs/1502.01852
For understanding Nesterov Momentum:
Advances in optimizing Recurrent Networks by Yoshua Bengio, Section 3.5
http://arxiv.org/pdf/1212.0901v2.pdf
Dropout: A Simple Way to Prevent Neural Networks from Overfitting
https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf