This project implements the Transformer Encoder from the paper "Attention is All You Need" (Vaswani et al., 2017), from scratch using NumPy only — no deep learning frameworks.
It closely follows the paper structure:
- Scaled Dot-Product Attention
- Multi-Head Attention
- Positional Encoding
- FeedForward Networks
- Layer Normalization + Residual Connections
- Stacked Encoder Blocks
attention.py
: Scaled Dot-Product Attention and Multi-Head Attentionpositional_encoding.py
: Sinusoidal positional encodingfeedforward.py
: Two-layer Feed Forward networkencoder_block.py
: One Transformer Encoder Blocktransformer_encoder.py
: Full Transformer Encoder (stack of blocks)glove_loader.py
: Load pre-trained GloVe embeddingstrain_toy_example.py
: Train Transformer + simple classifier on toy synthetic tasktest_sentence.py
: Pass real-world sentences through the Transformer Encoder for prediction
pip install numpy matplotlib
- Download GloVe 6B embeddings
- Extract and place glove.6B.50d.txt in your project directory.
- Train Transformer on Toy Task
python train_toy_example.py
This trains the Transformer + Classifier on a synthetic dataset.
- Test on Real Sentence
python test_sentence.py
This processes a real English sentence and predicts a class.
- Scaled Dot-Product Attention
- Multi-Head Attention
- Sinusoidal Positional Encoding
- FeedForward Networks
- Residual + Layer Normalization
- Stacking of Encoder Blocks
- Manual Training Loop (NumPy)
- Integration of GloVe real embeddings
- Real sentence inference testing
- Migrate project to PyTorch
- Fine-tune full Transformer Encoder on real datasets (e.g., IMDB)
- Visualize attention maps
- Build full Transformer (add Decoder block)
Built to learn deeply from research papers.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017).
"Attention is All You Need". Advances in Neural Information Processing Systems (NeurIPS).
This project implements the Transformer Encoder architecture described in the paper above for educational and research purposes.