Skip to content

Swagath18/Transformer-Encoder-from-Scratch-Paper2Code-

Repository files navigation

Transformer Encoder from Scratch (Paper2Code)

This project implements the Transformer Encoder from the paper "Attention is All You Need" (Vaswani et al., 2017), from scratch using NumPy only — no deep learning frameworks.

It closely follows the paper structure:

  • Scaled Dot-Product Attention
  • Multi-Head Attention
  • Positional Encoding
  • FeedForward Networks
  • Layer Normalization + Residual Connections
  • Stacked Encoder Blocks

Project Structure

  • attention.py : Scaled Dot-Product Attention and Multi-Head Attention
  • positional_encoding.py : Sinusoidal positional encoding
  • feedforward.py : Two-layer Feed Forward network
  • encoder_block.py : One Transformer Encoder Block
  • transformer_encoder.py : Full Transformer Encoder (stack of blocks)
  • glove_loader.py : Load pre-trained GloVe embeddings
  • train_toy_example.py : Train Transformer + simple classifier on toy synthetic task
  • test_sentence.py : Pass real-world sentences through the Transformer Encoder for prediction

How to Run

Install Requirements

pip install numpy matplotlib

(Optional) Download GloVe Embeddings

  • Download GloVe 6B embeddings
  • Extract and place glove.6B.50d.txt in your project directory.

Run Examples

  1. Train Transformer on Toy Task
python train_toy_example.py

This trains the Transformer + Classifier on a synthetic dataset.

  1. Test on Real Sentence
python test_sentence.py
This processes a real English sentence and predicts a class.

What's Implemented

  • Scaled Dot-Product Attention
  • Multi-Head Attention
  • Sinusoidal Positional Encoding
  • FeedForward Networks
  • Residual + Layer Normalization
  • Stacking of Encoder Blocks
  • Manual Training Loop (NumPy)
  • Integration of GloVe real embeddings
  • Real sentence inference testing

What's Next

  • Migrate project to PyTorch
  • Fine-tune full Transformer Encoder on real datasets (e.g., IMDB)
  • Visualize attention maps
  • Build full Transformer (add Decoder block)

Built to learn deeply from research papers.


References

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017).
    "Attention is All You Need". Advances in Neural Information Processing Systems (NeurIPS).

This project implements the Transformer Encoder architecture described in the paper above for educational and research purposes.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages