Links to original paper published on arXiv.org>cs>arXiv:1609.04747 : [1], [2]
Link to Blog with Paper Explanation : [3]
Implemented following Gradient Desent Optimization Algorithms from Scratch:
-
Vanilla Batch/Stochastic Gradient Descent [4]
-
Momentum [5]
-
NAG : Nesterov Accelarated Gradient [6]
-
AdaGrad : Adaptive Gradient Algorithm [7]
-
AdaDelta : Adaptive Learning Rate Method [8]
-
RMS Prop [9]
-
AdaMax : Infinity Norm based Adaptive Moment Estimation [12]
-
Nadam : Nesterov-accelarated Adaptive Moment Estimation [13]
-
AMSGrad [14]