This repository houses a comprehensive collection of Jupyter notebooks designed to complement Andrej Karpathy's neural networks walkthrough. Spanning a diverse spectrum of topics, these notebooks serve as invaluable resources for exploring the foundational concepts and practical application of neural networks.
Foundational concepts such as gradient understanding, backpropagation, and neural network training are covered. Additionally, practical applications extend to training diverse natural language models from basic Bigram models to advanced transformers.
Notebooks within the MicroGrad directory, cover the foundational concepts such as gradient flow understanding, backpropagation, and neural network training.
All related notebooks are under the directory MicroGrad_Walkthrough.
To understand the intricacies and challenges that one faces while building neural networks, I have experimented with training a wide range of character-level language model architectures from basic bigrams to advanced transformers by following Andrej Karpathy's walkthrough.
This character-level model is trained on a dataset of names to predict innovative and cool new names. While exploring the walkthrough, he also helped me gain a deeper knowledge of the PyTorch framework and how it operates under the hood for all foundational concepts.
All related notebooks are under the directory Makemore_Walkthrough.
-
Multi-Layer Perception (MLP) - Bengio et al. 2003
-
Recurrent Neural Network (RNN) - Mikolov et al. 2010
-
Long Short-Term Memory (LSTM) - Graves et al. 2014
-
Gated Recurrent Unit (GRU) - Kyunghyun Cho et al. 2014
-
Kaiming Initialization - Kaiming He et al. 2015
-
Batch Normalization - Sergey Ioffe et al. 2015
-
WaveNet: A Generative Model for Raw Audio - Aaron van den Oord et al. 2016
-
Transformers - Vaswani et al. 2017
-
Problems with Batch Normalization in Practice - Yuxin Wu et al. 2021