Diffusion models have rapidly become a cornerstone of modern generative AI, known for their ability to produce stunningly high-fidelity results. This project provides a complete from-scratch PyTorch implementation exploring the core mechanics of these powerful models. It implements the foundational Denoising Diffusion Probabilistic Model (DDPM) and its faster, deterministic counterpart, the Denoising Diffusion Implicit Model (DDIM). Developed for the M.S. course Generative Models, this repository breaks down the complex theory into clean, modular code for generating 16x16 pixel art sprites.
- Denoising Diffusion Probabilistic Model (DDPM): Full implementation from scratch.
- Denoising Diffusion Implicit Models (DDIM): Includes a faster, deterministic DDIM sampling loop.
- U-Net Noise Predictor: A U-Net architecture designed to predict the noise added at any timestep.
- Modular Code: All logic is organized into a clean, importable
src/package. - Evaluation: Built-in script to calculate the Fréchet Inception Distance (FID) score.
-
Generative Modeling: Learning a data distribution
$p(x)$ to generate new samples. - Diffusion Models: A class of models that work by systematically destroying data structure (forward process) and then learning to reverse the process (reverse process).
-
Forward (Noising) Process: A Markov process that gradually adds Gaussian noise to an image
$\mathbf{x}_0$ over$T$ timesteps, producing a sequence of noisy images$\mathbf{x}_1, ..., \mathbf{x}_T$ . -
Reverse (Denoising) Process: A learned Markov process
$p_{\theta}(\mathbf{x}_{t-1} | \mathbf{x}_t)$ that denoises an image from$\mathbf{x}_T \sim \mathcal{N}(0, \mathbf{I})$ back to a clean image$\mathbf{x}_0$ . - U-Net Architecture: Using skip connections to preserve high-resolution features, making it ideal for image-to-image tasks like noise prediction.
This project trains a model,
The forward process,
A key property of this process is that we can sample
This means we can generate a training pair
The goal of the model is to learn the reverse process
The training loss is a simple Mean Squared Error (MSE) between the predicted noise and the actual noise:
Once the model
The original DDPM paper derives the following equation for sampling
where
DDIM provides a more general sampling process that is deterministic when
The DDIM update rule is:
where
The noise predictor
-
Input: A noised image
$\mathbf{x}_t$ (shape[B, 3, 16, 16]) and its timestep$t$ . -
Output: The predicted noise
$\epsilon$ (shape[B, 3, 16, 16]). -
Architecture: It consists of a down-sampling path (encoder) and an up-sampling path (decoder) with skip connections. The timestep
$t$ and context labels$c$ are embedded and injected into the model at various resolutions. This implementation usesResidualConvBlocks and fixes a critical inefficiency from the original notebook where a shortcut layer was re-initialized on every forward pass.
pytorch-diffusion-sprites/
├── .gitignore # Ignores data, logs, outputs, and pycache
├── LICENSE # MIT License file
├── README.md # You are here!
├── requirements.txt # Project dependencies
├── notebooks/
│ └── run.ipynb # Jupyter notebook to run the full pipeline
├── scripts/
│ ├── download_data.sh # Script to download the .npy dataset
│ ├── train.py # Main training script
│ ├── sample.py # Script to generate sample images
│ └── evaluate.py # Script to generate images and run FID evaluation
└── src/
├── __init__.py # Makes 'src' a Python package
├── config.py # All hyperparameters and file paths
├── data_loader.py # CustomDataset and get_dataloaders function
├── model.py # U-Net model architecture (Unet, ResidualConvBlock, etc.)
├── diffusion.py # DiffusionScheduler class (holds DDPM/DDIM logic)
└── utils.py # Utility functions (logging, plotting, saving images)
-
Clone the Repository:
git clone https://github.com/msmrexe/pytorch-diffusion-sprites.git cd pytorch-diffusion-sprites -
Install Requirements:
pip install -r requirements.txt
-
Download the Data: Run the download script. This will create a
data/folder and place the.npyfiles inside.bash scripts/download_data.sh
-
Train the Model: Run the training script. The model will be trained according to the settings in
src/config.py. The best model (based on validation loss) will be saved tooutputs/models/ddpm_sprite_best.pth. A loss plot will be saved tooutputs/loss_plot.png.python scripts/train.py
-
Generate Samples: After training, you can generate a grid of sample images.
- Using DDPM (1000 steps, stochastic):
python scripts/sample.py --n-samples 16 --method ddpm
- Using DDIM (50 steps, deterministic):
python scripts/sample.py --n-samples 16 --method ddim --n-ddim-steps 50 --eta 0.0
This will save a file to
outputs/samples/. - Using DDPM (1000 steps, stochastic):
-
Evaluate the Model (FID Score): This script will generate 3000 real images and 3000 fake images, save them to
outputs/eval/, and then compute the FID score.python scripts/evaluate.py --n-samples 3000 --method ddim --n-ddim-steps 100
Feel free to connect or reach out if you have any questions!
- Maryam Rezaee
- GitHub: @msmrexe
- Email: ms.maryamrezaee@gmail.com
This project is licensed under the MIT License. See the LICENSE file for full details.