Simple Bigram Language Model in PyTorch

A foundational character-level Bigram Language Model built from scratch using PyTorch.

This project is designed to be a clear and simple implementation, perfect for understanding the basics of language modeling before moving on to more complex architectures like Transformers.

The model is trained on the text of "The Wonderful Wizard of Oz" and can generate new, albeit simplistic, text in a similar style.

🚀 Key Features

Character-Level Tokenization: The model learns from individual characters, not words.
Simple Bigram Architecture: Predicts the next character based only on the previous one, implemented with a simple nn.Embedding layer.
Configuration Driven: All hyperparameters, file paths, and settings are managed in a single config.yaml file.
Clean Code Structure: The project is organized into logical modules for data handling, model definition, training, and inference.
Reproducible: The training script handles everything from data processing to saving the final model and character mappings.

📂 Project Structure

The repository is organized to separate source code from data and model outputs, making it clean and easy to navigate.

.
├── config.yaml               # All hyperparameters and paths
├── data/
│   └── wizard_of_oz.txt      # Your training dataset
├── model_output/             # All generated files are saved here
│   ├── bigram.pth            # The trained model weights
│   └── mappings.json         # Character-to-integer mappings
├── data_utils.py             # Handles data loading, encoding, and batching
├── model.py                  # The PyTorch nn.Module definition
├── train.py                  # Script to train the model
├── inference.py              # Script to generate text from a trained model
└── README.md                 # This file

🛠️ Setup and Installation

Follow these steps to get the project running on your local machine.

1. Clone the Repository

git clone https://github.com/shivendra-dev54/bigram-model
cd bigram-model

2. Create a Virtual Environment (Recommended)

It's good practice to create a virtual environment to manage project dependencies.

# For Windows
python -m venv venv
venv\Scripts\activate

# For macOS/Linux
python3 -m venv venv
source venv/bin/activate

3. Install Dependencies

This project requires PyTorch and PyYAML.

pip install torch pyyaml

⚙️ Usage

Running the project involves three main steps: preparing the data, training the model, and running inference.

Step 1: Add Your Data

Place your training data, a single plain text file (e.g., wizard_of_oz.txt), inside the data/ directory. Make sure the data_path in config.yaml points to this file.

Step 2: Configure Your Training (Optional)

You can modify the hyperparameters in config.yaml to experiment with different settings.

# In config.yaml
batch_size: 32
block_size: 128
max_iters: 5000
learning_rate: 3e-4
# ... and other settings

Step 3: Train the Model

Run the training script from the root directory. This will process your data, train the model, and save the model weights (bigram.pth) and character mappings (mappings.json) to the model_output/ directory.

python train.py

You should see output indicating the training progress:

Using device: cuda
✅ Mappings saved to ./model_output/mappings.json
Vocabulary size: 81
🚀 Starting training...
step 0: train loss 4.6821, val loss 4.6759
step 500: train loss 2.8913, val loss 2.9104
...
step 4999: train loss 2.4812, val loss 2.5031
✅ Model saved to ./model_output/bigram.pth

Step 4: Generate Text (Inference)

Once the model is trained, you can generate new text by running the inference script.

python inference.py

This script will load the saved model and mappings from model_output/ and print the generated text to the console:

✍️ Generating text...

--- GENERATED OUTPUT ---

The oot t a's s,
"Id as,
he s wef, l.
"
Th o
"
"I dyo an.
Tine sas, thet lof thavane, shor, se th l l,
Asorof t he ood athe s."
Dorot we s aind t anororot sse s asor asorof t s t hind wind," sanot t s s, the we a t aind t aind," se,
s,
"I d, thet sorot lof t we, sor the s, thet aind, s," se, sorof t we s, shind t wef l, the wef wef,

------------------------

🔬 How It Works

A Bigram Model is one of the simplest types of language models. Its core assumption is that the probability of the next character in a sequence depends only on the immediately preceding character.

It completely ignores any context before that single character. For example, when trying to predict the character that follows "l" in the word "hello", it only uses "l" to make its prediction, ignoring "h", "e", and "l".

In this implementation, the nn.Embedding layer acts as a direct lookup table. For a given vocabulary size V, the embedding table is of size (V, V). When you input the index of a character, the model simply looks up that row in the table. This row contains V numbers (logits), representing the model's confidence for every possible character in the vocabulary being the next one.

While this approach is too simple for generating coherent, long-form text, it's a fundamental concept and a great starting point for understanding how models learn sequential patterns.

📜 License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
model_output		model_output
src		src
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple Bigram Language Model in PyTorch

🚀 Key Features

📂 Project Structure

🛠️ Setup and Installation

1. Clone the Repository

2. Create a Virtual Environment (Recommended)

3. Install Dependencies

⚙️ Usage

Step 1: Add Your Data

Step 2: Configure Your Training (Optional)

Step 3: Train the Model

Step 4: Generate Text (Inference)

🔬 How It Works

📜 License

Build by Shivendra Devadhe

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Simple Bigram Language Model in PyTorch

🚀 Key Features

📂 Project Structure

🛠️ Setup and Installation

1. Clone the Repository

2. Create a Virtual Environment (Recommended)

3. Install Dependencies

⚙️ Usage

Step 1: Add Your Data

Step 2: Configure Your Training (Optional)

Step 3: Train the Model

Step 4: Generate Text (Inference)

🔬 How It Works

📜 License

Build by Shivendra Devadhe

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages