In this blog article I have taken the survey paper published by Sebastian Ruder in 2017, An overview of gradient descent optimization algorithms this paper can also be seen on his website. Also, along with this towards the end a special focus has been on AdaBelief paper published in 2020.
This blog is a reinterpretation of his work, some new optimization algorithms which have become quiet popular has been added in the blog to give a complete overview of optimization algorithms till date.
Furthermore, each optimization method have been seen through their original research papers. The shortcomings of each have been mentioned which is addressed in the subsequent topics introducing new optimization techniques.
In order to have a establish understanding of all optimizers in a connected way, and to keep away from the different notations used in the papers, all the mathematical equations have been made consistent from start to the end following the same notations for each optimization algorithms to establish a good understanding on how.
Comparison amongst various optimizers have also been made and shown through diagrams and animations in the blog.
This blog article has been emphasized on explaining the key concepts, shortcomings of previous approches and approaches taken to overcome the shortcomings by evaluating each papers along with the survey papers and further present the concepts in an intuitive way giving an original explanation by connecting all the work done so far. For the working code implementation, code for ADAM optimizer has been provided in the end.
Note: - In case of issues with viewing diagrams and visualisations please refer to the same notebook on google colaboratory as the images have been embedded in the original colab notebook colab notebook link.