KAIST CS492(D): Diffusion Models and Their Applications (Fall 2024)
Programming Assignment 1
Instructor: Minhyuk Sung (mhsung [at] kaist.ac.kr)
TA: Seungwoo Yoo (dreamy1534 [at] kaist.ac.kr)
Credit: Juil Koo (63days [at] kaist.ac.kr) & Nguyen Minh Hieu (hieuristics [at] kaist.ac.kr)
In this programming assignment, you will implement the Denoising Diffusion Probabilistic Model (DDPM), a fundamental building block that empowers today's diffusion-based generative modeling. While DDPM provides the technical foundation for popular generative frameworks like Stable Diffusion, its implementation is surprisingly straightforward, making it an excellent starting point for gaining hands-on experience in building diffusion models. We will begin with a relatively simple example: modeling the distribution of 2D points on a spiral (known as the "Swiss Roll"). Following that, we will develop an image generator using the AFHQ dataset to explore how DDPM and diffusion models seamlessly adapt to changes in data format and dimensionality with minimal code changes.
Create a conda
environment named ddpm
and install PyTorch:
conda create --name ddpm python=3.10
conda activate ddpm
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
Install the required package within the requirements.txt
pip install -r requirements.txt
NOTE: We have removed the dependency on
chamferdist
due to issues during installation.
.
├── 2d_plot_diffusion_todo (Task 1)
│ ├── ddpm_tutorial.ipynb <--- Main code
│ ├── dataset.py <--- Define dataset (Swiss-roll, moon, gaussians, etc.)
│ ├── network.py <--- (TODO) Implement a noise prediction network
│ └── ddpm.py <--- (TODO) Define a DDPM pipeline
│
└── image_diffusion_todo (Task 2)
├── dataset.py <--- Ready-to-use AFHQ dataset code
├── model.py <--- Diffusion model including its backbone and scheduler
├── module.py <--- Basic modules of a noise prediction network
├── network.py <--- Definition of the U-Net architecture
├── sampling.py <--- Image sampling code
├── scheduler.py <--- (TODO) Implement the forward/reverse step of DDPM
├── train.py <--- DDPM training code
└── fid
├── measure_fid.py <--- script measuring FID score
└── afhq_inception.ckpt <--- pre-trained classifier for FID
Implementation of diffusion models would be simple once you understand the theory. So, to learn the most from this tutorial, it's highly recommended to check out the details in the related papers and understand the equations BEFORE you start the assignment. You can check out the resources in this order:
Denoising Diffusion Probabilistic Model (DDPM) is one of latent-variable generative models consisting of a Markov chain. In the Markov chain, let us define a forward process that gradually adds noise to the data sampled from a data distribution
where a variance schedule
Thanks to a nice property of a Gaussian distribution, one can directly sample
where
Refer to our slide or blog for more details.
If we can reverse the forward process, i.e. sample
where
To learn this reverse process, we set an objective function that minimizes KL divergence between
As a parameterization of DDPM, the authors set
The authors empirically found that predicting
In short, the simplified objective function of DDPM is defined as follows:
where
Refer to the original paper for more details.
Once we train the noise prediction network
A typical diffusion pipeline is divided into three components:
In this task, we will look into each component one by one in a toy experiment and implement them sequentially.
After finishing the implementation, you will be able to train DDPM and evaluate the performance in ddpm_tutorial.ipynb
under 2d_plot_todo
directory.
❗️❗️❗️ You are only allowed to edit the part marked by TODO. ❗️❗️❗️
You first need to implement a noise prediction network in network.py
.
The network should consist of TimeLinear
layers whose feature dimensions are a sequence of [dim_in
, dim_hids[0]
, ..., dim_hids[-1]
, dim_out
].
Every TimeLinear
layer except for the last TimeLinear
layer should be followed by a ReLU activation.
Now you should construct a forward and reverse process of DDPM in ddpm.py
.
q_sample()
is a forward function that maps
p_sample()
is a one-step reverse transition from p_sample_loop()
is the full reverse process corresponding to DDPM sampling algorithm.
In ddpm.py
, compute_loss()
function should return the simplified noise matching loss in DDPM paper.
Once you finish the implementation above, open and run ddpm_tutorial.ipynb
via jupyter notebook. It will automatically train a diffudion model and measure chamfer distance between 2D particles sampled by the diffusion model and 2D particles sampled from the target distribution.
Take screenshots of:
- the training loss curve
- the Chamfer Distance reported after executing the Jupyter Notebook
- the visualization of the sampled particles
Below are the examples of (1) and (3).
If you successfully finish the task 1, implement the methods add_noise
and step
of the class DDPMScheduler
defined in image_diffusion_todo/scheduler.py
. You also need to implement the method get_loss
of the class DiffusionModule
defined in image_diffusion_todo/model.py
. Refer to your implementation of the methods q_sample
, p_sample
, and compute_loss
from the 2D experiment.
In this task, we will generate
To train your model, simply execute the command: python train.py
.
❗️❗️❗️ You are NOT allowed to modify any given hyperparameters. ❗️❗️❗️
It will sample images and save a checkpoint every args.log_interval
. After training a model, sample & save images by
python sampling.py --ckpt_path ${CKPT_PATH} --save_dir ${SAVE_DIR_PATH}
We recommend starting the training as soon as possible since the training would take 14 hours.
As an evaluation, measure FID score using the pre-trained classifier network we provide:
python dataset.py # to constuct eval directory.
python fid/measure_fid.py @GT_IMG_DIR @ GEN_IMG_DIR
Do NOT forget to execute
dataset.py
before measuring FID score. Otherwise, the output will be incorrect due to the discrepancy between the image resolutions.
For instance:
Use the validation set of the AFHQ dataset (e.g., data/afhq/eval
) as @GT_IMG_DIR. The script will automatically search and load the images. The path @DIR_TO_SAVE_IMGS should be the same as the one you provided when running the script sampling.py
.
Take a screenshot of a FID score and include at least 8 sampled images.
Submission Item List
- Code without model checkpoints
Task 1
- Loss curve screenshot
- Chamfer distance result of DDPM sampling
- Visualization of DDPM sampling
Task 2
- FID score result
- At least 8 images generated your DDPM model
In a single document, write your name and student ID, and include submission items listed above. Refer to more detailed instructions written in each task section about what to submit.
Name the document {NAME}_{STUDENT_ID}.pdf
and submit both your code and the document as a ZIP file named {NAME}_{STUDENT_ID}.zip
.
When creating your zip file, exclude data (e.g., files in AFHQ dataset) and any model checkpoints, including the provided pre-trained classifier checkpoint when compressing the files.
Submit the zip file on GradeScope.
You will receive a zero score if:
- you do not submit,
- your code is not executable in the Python environment we provided, or
- you modify anycode outside of the section marked with
TODO
or use different hyperparameters that are supposed to be fixed as given.
Plagiarism in any form will also result in a zero score and will be reported to the university.
Your score will incur a 10% deduction for each missing item in the submission item list.
Otherwise, you will receive up to 20 points from this assignment that count toward your final grade.
- Task 1
- 10 points: Achieve CD lower than 20 from DDPM sampling.
- 5 points: Achieve CD greater, or equal to 20 and less than 40 from DDPM sampling.
- 0 point: otherwise.
- Task 2
- 10 points: Achieve FID less than 20.
- 5 points: Achieve FID between 20 and 40.
- 0 point: otherwise.
If you are interested in this topic, we encourage you to check ou the materials below.