Arithmetic Pretrained Transformer is a MechInterp project into how transformers learn arithmetic.
This project is heavily inspired by Neel Nanda's "Progress Measured for Grokking via Mechanistic Interpretability". The idea that transformers generalize very late into training by actually implementing an algorithm was fascinating to me. For now, I wanted to focus on simple arithmetic tasks (starting with addition and multiplication, hopefully moving to subtraction and division), hence the name. Much of the original transformer implementation comes from following along with Andrej Karpathy's wonderful NanoGPT tutorial.
