Generative Models of Images

Overview

This project explores different generative models for image synthesis, including Convolutional Neural Networks (CNNs), Encoder-only Transformers, Generative Adversarial Networks (GANs), and Denoising Diffusion Probabilistic Models (DDPMs). We implement and experiment with these architectures, analyzing their effectiveness for image generation and inpainting tasks.

Project Structure

├── project/
│   ├── data/
│   ├── diffusion.py
│   ├── main.py
│   ├── requirements.txt
│   ├── run_in_cloud.ipynb
│   ├── trainer.py
│   ├── unet.py
│   ├── utils.py
├── README.md

Getting Started

Installation

To set up the environment locally, follow these steps:

Install Python dependencies:
```
pip install torch einops clean-fid
```
Run the main training script:
```
python main.py
```

Convolutional Neural Networks (CNNs)

CNNs are widely used for image-based generative tasks, especially in architectures like U-Net.

U-Net for Image Inpainting

U-Net employs skip connections between the encoder and decoder layers to preserve spatial details during reconstruction. In this task, we use U-Net for inpainting, where missing pixels are filled based on surrounding image features.

Implementation

Input: Partially masked image with a binary mask indicating missing pixels.
Output: Reconstructed image with missing pixels filled.
Loss Functions:
- MSE Loss ensures that predicted pixels match the original ones.
- Adversarial Loss (when used with a discriminator) improves realism.

Encoder-only Transformers

Transformers process entire sequences in parallel, making them effective for structured image representations.

Model Implementation

We implement an encoder-only Transformer for part-of-speech tagging and analyze how these models differ from decoder-only variants.

Encoder-only Models (e.g., BERT) generate contextual embeddings for all input tokens simultaneously.
Decoder-only Models (e.g., GPT) use autoregressive generation, predicting tokens sequentially.

Applications:

Encoder-only: Classification, segmentation, token-wise prediction.
Decoder-only: Text/image generation, machine translation.

Generative Adversarial Networks (GANs)

GANs use an adversarial setup where a generator learns to create realistic samples while a discriminator tries to distinguish generated images from real ones.

GAN-based Inpainting

For inpainting tasks, we use:

Generator (U-Net-based): Predicts missing pixels given an input mask.
Discriminator: Distinguishes inpainted images from real ones.

Training Objectives:

Generator Loss:

L_G = E(x,m)[∥m⊙ (y − x')∥^2] - λ * E(x)[log D(x')]

Discriminator Loss:

L_D = E(x)[log D(x)] + E(x,m)[log(1 - D(x'))]

Denoising Diffusion Probabilistic Models (DDPMs)

Diffusion models generate images by gradually denoising random noise through a learned reverse process.

Model Implementation

The Diffusion class implements forward and reverse diffusion using:

Cosine noise schedule to control variance.
U-Net architecture for denoising function.
Reparameterization trick for efficient sampling.

Forward Process:

xt = sqrt(alpha_bar_t) * x0 + sqrt(1 - alpha_bar_t) * noise

Reverse Process (Denoising):

x_hat_0 = (xt - sqrt(1 - alpha_bar_t) * epsilon) / sqrt(alpha_bar_t)

!()[diffusion_reverse_process]

Training & Sampling

Training minimizes the L1 loss between predicted and actual noise:

loss = F.l1_loss(pred_noise, noise)

During sampling, the model iteratively refines noisy images to generate realistic outputs.

How to Run the Code

Train the diffusion model:
```
python main.py --train
```
Evaluate FID score:
```
python main.py (...) --fid
```

Key Learnings

CNNs (U-Net) effectively reconstruct missing image regions.
Transformers capture contextual dependencies in structured tasks.
GANs produce sharper inpainted images but can be unstable.
Diffusion models generate high-quality images with iterative refinement.

Acknowledgments

This project is part of 10-623 Generative AI at Carnegie Mellon University, with datasets and starter code provided by the course instructors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Generative Models of Images

Overview

Project Structure

Getting Started

Installation

Convolutional Neural Networks (CNNs)

U-Net for Image Inpainting

Implementation

Encoder-only Transformers

Model Implementation

Applications:

Generative Adversarial Networks (GANs)

GAN-based Inpainting

Training Objectives:

Denoising Diffusion Probabilistic Models (DDPMs)

Model Implementation

Forward Process:

Reverse Process (Denoising):

Training & Sampling

How to Run the Code

Key Learnings

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
diffusion.py		diffusion.py
main.py		main.py
requirements.txt		requirements.txt
run_in_cloud.ipynb		run_in_cloud.ipynb
test_diffusion.py		test_diffusion.py
trainer.py		trainer.py
unet.py		unet.py
utils.py		utils.py

License

JavierAM01/Small_Difussion_Model

Folders and files

Latest commit

History

Repository files navigation

Generative Models of Images

Overview

Project Structure

Getting Started

Installation

Convolutional Neural Networks (CNNs)

U-Net for Image Inpainting

Implementation

Encoder-only Transformers

Model Implementation

Applications:

Generative Adversarial Networks (GANs)

GAN-based Inpainting

Training Objectives:

Denoising Diffusion Probabilistic Models (DDPMs)

Model Implementation

Forward Process:

Reverse Process (Denoising):

Training & Sampling

How to Run the Code

Key Learnings

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages