Table of Content
Task Checklist
Task 1
- 1.1 Define Forward SDE
- 1.1 Define Backward SDE
- 1.1 Define VPSDE
- 1.1 Define VESDE
- 1.2 Implement MLP Network
- 1.2 Implement DSM Loss
- 1.2 Implement Training Loop
- 1.3 Implement Discretization
- 1.3 Implement Sampling Loop
- 1.4 Evaluate Implementation
Task 2
- 2.1 Implement DDIM Variance Scheduling
- 2.2 Implement CFG
- 2.3 Implement Image Inpainting
Optional Tasks
- Add more additional task that you did here.
- Implement EMA Training
- Implement ISM Loss
- Implement ODE Sampling
- Implement Schrodinger Bridge
- Implement MCG Inpainting
Install the required package within the requirements.txt
pip install -r requirements.txt
.
├── image_diffusion (Task 2)
│ ├── dataset.py <--- Ready-to-use AFHQ dataset code
│ ├── train.py <--- DDPM training code
│ ├── sampling.py <--- Image sampling code
│ ├── ddpm.py <--- DDPM high-level wrapper code
│ ├── module.py <--- Basic modules of a noise prediction network
│ ├── network.py <--- Noise prediction network
│ ├── scheduler.py <--- (TODO) Define variance schedulers
│ └── fid
│ ├── measure_fid.py <--- script measuring FID score
│ └── afhq_inception.ckpt <--- pre-trained classifier for FID
└── sde_todo (Task 1)
├── HelloScore.ipynb <--- Main code
├── dataset.py <--- Define dataset (Swiss-roll, moon, gaussians, etc.)
├── eval.py <--- Evaluation code
├── loss.py <--- (TODO) Define Training Objective
├── network.py <--- (TODO) Define Network Architecture
├── sampling.py <--- (TODO) Define Discretization and Sampling
├── sde.py <--- (TODO) Define SDE Processes
└── train.py <--- (TODO) Define Training Loop
Implementation of Diffusion Models is typically very simple once you understand the theory. So, to learn the most from this tutorial, it's highly recommended to check out the details in the related papers and understand the equations BEFORE you start the tutorial. You can check out the resources in this order:
- [blog] Charlie's "Brownian Motion and SDE"
- [paper] Score-Based Generative Modeling through Stochastic Differential Equations
- [blog] Lilian Wang's "What is Diffusion Model?"
- [paper] Denoising Diffusion Probabilistic Models
- [slide] Summary of DDPM and DDIM
The first part of this tutorial will introduce diffusion models through the lens of stochastic differential equations (SDE). Prior to the Yang Song et al. (2021) paper, diffusion models are often understood in terms of Markov Processes with tractable transition kernel. Understanding SDE could also help you develop more efficient variance scheduling or give more flexibility to your diffusion model.
We know that a stochastic differential equation has the following form:
- The OU process always results in a unit Gaussian.
- We can derive the equation for the inverse OU process.
From these facts, we can directly sample from the unknown distribution by
- Sample from unit Gaussian
- Run the reverse process on samples from step 1.
Yang Song et al. (2021) derived the likelihood training scheme
for learning the reverse process. In summary, the reverse process for any SDE given above is
of the form
TODO:
- Derive the expression for the mean and std of the OU process at time t given X0 = 0,
i.e. Find E[Xt|X0] and Var[Xt|X0]. You will need this for task 1.1(a).
hint: We know that the solution to the OU process is given as
and you can use the fact that
A typical diffusion pipeline is divided into three components:
In this task, we will look into each component one by one and implement them sequentially.
Our first goal is to setup the forward and reverse processes. In the forward process, the final distribution should be the prior distribution which is the standard normal distribution.
Following the formulation of the OU Process introduced in the previous section, complete the TODO
in the
sde.py
and check if the final distribution approach unit gaussian as
TODO:
- implement the forward process using the given marginal probability p_t0(Xt|X0) in SDE.py
- implement the reverse process for general SDE in SDE.py
- (optional) Play around with terminal time (T) and number of time steps (N) and observe its effect
It's mentioned by Yang Song et al. (2021) that the DDPM and SMLD are distretization of SDEs.
Implement this in the sde.py
and check their mean and and std.
hint: Although you can simulate the diffusion process through discretization, sampling with the explicit equation of the marginal probability
You should also obtain the following graphs for VPSDE and VESDE respectively
TODO:
- implement VPSDE in SDE.py
- implement VESDE in SDE.py
- plot the mean and variance of VPSDE and VESDE vs. time.
What can you say about the differences between OU, VPSDE, VESDE?
The typical training objective of diffusion model uses Denoising Score Matching loss:
Where
(Important) you need to derive a different DSM objective for each SDE since
their marginal density is different. You first need to obtain the closed form for
However, there are other training objectives with their different trade-offs (SSM, EDM, etc.). Highly recommend to checkout A Variational Perspective on Diffusion-based Generative Models and Score Matching and Elucidating the Design Space of Diffusion-Based Generative Models for a more in-depth analysis of the recent training objectives.
TODO:
- implement your own network in network.py
(Recommend to implement Positional Encoding, Residual Connection)
- implement DSMLoss in loss.py
- implement the training loop in train_utils.py
- (optional) implement ISMLoss in loss.py (hint: you will need to use torch.autograd.grad)
- (optional) implement SSMLoss in loss.py
Finally, we can now use the trained score prediction network to sample from the swiss-roll dataset. Unlike the forward process, there is no analytical form of the marginal probabillity. Therefore, we have to run the simulation process. Your final sampling should be close to the target distribution within 10000 training steps. For this task, you are free to use ANY variations of diffusion process that was mentioned above.
TODO:
- implement the predict_fn in sde.py
- complete the code in sampling.py
- (optional) train with ema
- (optional) implement the correct_fn (for VPSDE, VESDE) in sde.py
- (optional) implement the ODE discretization and check out their differences
To evaluate your performance, we compute the chamfer distance (CD) and earth mover distance (EMD) between the target and generated point cloud. Your method should be on par or better than the following metrics. For this task, you can use ANY variations, even ones that were NOT mentioned.
target distribution | CD |
---|---|
swiss-roll | 0.1975 |
One restriction to the typical diffusion processes are that they requires the prior to be easy to sample (gaussian, uniform, etc.). Schrödinger Bridge removes this limitation by making the forward process also learnable and allow a diffusion defined between two unknown distribution.
In this task, we will play with diffusion models to generate 2D images. We first look into some background of DDPM and then dive into DDPM in a code level.
From the perspective of SDE, SGM and DDPM are the same models with only different parameterizations. As there are forward and reverse processes in SGM, the forward process, or called diffusion process, of DDPM is fixed to a Markov chain that gradually adds Gaussian noise to the data:
Thanks to a nice property of a Gaussian distribution, one can sample
where
Given the diffusion process, we want to model the reverse process that gradually denoises white Gaussian noise
To learn this reverse process, we set an objective function that minimizes KL divergence between
Refer to the original paper or our PPT material for more details.
As a parameterization of DDPM, the authors set
In short, the simplified objective function of DDPM is defined as follows:
where
Once we train the noise prediction network
DDIM proposed a way to speed up the sampling using the same pre-trained DDPM. The reverse step of DDIM is below:
Note that
Please refer to DDIM paper for more details.
In this task, we will generate scheduler.py
. After implementing the schedulers, train a model by python train.py
. It will sample images and save a checkpoint every args.log_interval
. After training a model, sample & save images by
python sampling.py --ckpt_path ${CKPT_PATH} --save_dir ${SAVE_DIR_PATH}
We recommend starting the training as soon as possible since the training would take about half of a day. Also, DDPM scheduler is really slow. We recommend implementing DDIM scheduler first and set inference timesteps
20~50 which is enough to get high-quality images with much less sampling time.
As an evaluation, measure FID score using the pre-trained classifier network we provide:
python dataset.py # to constuct eval directory.
python fid/measure_fid.py /path/to/eval/dir /path/to/sample/dir
Success condition: Achieve FID score lower than 30
.
Now, we will implement a classifier-free guidance diffusion model. It trains an unconditional diffusion model and a conditional diffusion model jointly by randomly dropping out a conditional term. The algorithm is below:
You need to train another diffusion model for classifier-free guidance by slightly modifying the network architecture so that it can take class labels as input. The network design is your choice. Our implementation used nn.Embedding
for class label embeddings and simply add class label embeddings to time embeddings. We set condition term dropout rate 0.1 in training and guidance_scale
7.5.
Note that the provided code considers null class label as 0.
Generate 200 images per category, 600 in total. Measure FID with the same validation set used in 2.1.
Success condition: Achieve FID score lower than 30
.
For more details, refer to the paper.
DDPMs have zero-shot capabilities handling various downstream tasks beyond unconditional generation. Among them, we will focus on the image inpainting task only.
Note that there is no base code for image inpainting.
Make a rectangle hole with a
Report FID scores with 500 result images and the same validation set used in Task 2.1. You will get a full credit if the FID is lower than 30.
A recent paper Improving Diffusion Models for Inverse Problems using Manifold Constraints, also known as MCG, proposed a way to improve the solving various inverse problems, such as image inpainting, using DDPMs. In a high-level idea, in the reverse process, it takes an additional gradient descent towards a subspace of a latent space satisfying a given partial observation. Refer to the original paper for more details and implement MCG-based image inpainting code.
Compare image inpainting results between MCG and the baseline.
- [paper] Score-Based Generative Modeling through Stochastic Differential Equations
- [paper] Improved Techniques for Training Score-Based Generative Models
- [paper] Denoising Diffusion Probabilistic Models
- [paper] Diffusion Models Beat GANs on Image Synthesis
- [paper] Classifier-Free Diffusion Guidance
- [paper] Denoising Diffusion Implicit Models
- [paper] Elucidating the Design Space of Diffusion-Based Generative Models
- [paper] A Variational Perspective on Diffusion-Based Generative Models and Score Matching
- [paper] Trans-Dimensional Generative Modeling via Jump Diffusion Models
- [paper] Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory
- [blog] What is Diffusion Model?
- [blog] Generative Modeling by Estimating Gradients of the Data Distribution
- [lecture] Charlie's Playlist on Diffusion Processes
- [slide] Juil's presentation slide of DDIM
- [slide] Charlie's presentation of Schrödinger Bridge.