KAIST CS492(D): Diffusion Models and Their Applications (Fall 2024)
Programming Assignment
Instructor: Minhyuk Sung (mhsung [at] kaist.ac.kr)
TA: Jaihoon Kim (jh27kim [at] kaist.ac.kr)
Score Distillation Sampling (SDS) is a technique used in generative models, particularly in the context of diffusion models. It leverages a pretrained model to guide the generation or editing of target samples by distilling the score (a measure of how well the sample aligns with the target distribution) back into the sampling process. Distillation sampling is particularly useful when pretrained diffusion models cannot directly generate target samples (e.g., 3D objects). In this programming assignment, we will begin with a simple application, 2D image generation using SDS and its variants. Unlike the reverse process of a diffusion model, distillation sampling parameterizes the target content (e.g., images) and optimizes the parameters based on a predefined loss function. Next, we will edit the given source images to align with target prompts using Posterior Distillation Sampling (PDS).
conda create -n cs492d python=3.8
conda activate cs492d
pip install -r requirements.txt
Install PyTorch with your CUDA version. PyTorch Previous Versions.
For environment with CUDA 12.1:
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
.
├── data
│ ├── imgs <--- Source images for editing tasks
│ └── prompt_img_pairs.json <--- Metadata of text prompts and source images
├── eval.py <--- CLIP score evaluation code
├── guidance
│ └── sd.py <--- (TODO) Implement SDS and PDS
├── main.py <--- Entry point
└── utils.py <--- Utility functions
Distillation sampling parameterizes the target content (e.g., images) and optimizes the parameters using the gradient of the distillation loss function
We provide text prompts and source images for generating and editing images. Change the $HOME
prefix in image paths to your directory. For inference, use the data/prompt_img_pairs.json
, which contains test prompts and source images. For each task, implement the loss function in guidance/sd.py
. Use a fixed guidance scale for each task: 25 for SDS and 7.5 for PDS.
In this task, you will generate images using SDS. First, initialize latent decode_latents()
in guidance/sd.py
.
To generate imges using SDS loss run the following command:
python main.py --prompt "{$PROMPT}" --loss_type sds --guidance_scale 25
Refer to data/prompt_img_pairs.json
for prompt
.
Implement get_sds_loss()
in guidance/sd.py
. The function receives the latent image guidance_scale
used for the Classifier Free Guidance weight (CFG). The function should return the computed loss of SDS.
The goal of PDS is to edit source image
The stochastic latents can be obtained by reformulating the reverse step:
PDS computes the following loss to match the stocastic latents of source and target latents. Note that
To edit imges using PDS loss run the following command:
python main.py --prompt "{$PROMPT}" --loss_type pds --guidance_scale 7.5 --edit_prompt "{$EDIT_PROMPT}" --src_img_path {SRC_IMG_PATH}
Refer to data/prompt_img_pairs.json
for prompt
, edit_prompt
, and src_img_path
. Implement get_pds_loss()
in guidance/sd.py
. Note that
Variational Score Distillation (VSD) in ProlificDreamer aims to improve the sampling quality of SDS by utilizing LoRA to mimic the noise prediction of a pre-trained diffusion model. Given the pretrained diffusion model and a LoRA module, denoted as
Generate images using the same text prompts provided in Task 1. For VSD, use 7.5 for the guidance_scale
.
For evaluation, we will measure the CLIP score of the generated images. CLIP (Contrastive Language-Image Pre-training) is a model that embeds images and texts into a shared embedding space. The CLIP Score measures the similarity between an image and a text description, with higher scores indicating a closer match.
Place the generated/edited images in a single directory and ensure that the generated/edited images are named using their prompts with spaces replaced by underscores (e.g., a_boat_in_a_frozen_river.png).
Then run the following command to measure the CLIP score which will create eval.json
file:
python eval.py --fdir1 {$FDIR}
Submission Item List
- Code
- PDF file
Task 1
- CLIP score evaluation
eval.json
(output ofeval.py
) - Output results of generated images using the provided prompts
Task 2
- CLIP score evaluation
eval.json
(output ofeval.py
) - Output results of edited images using the provided prompts and images
Task 3 (Optional)
- CLIP score evaluation
eval.json
(output ofeval.py
) - Output results of generated images using the provided prompts
Submit a zip file named {NAME}_{STUDENT_ID}.zip
containing the implemented codes and generated/edited images.
Organize the generated and edited images as below and submit the zip file on GradeScope.
./outputs/
├── pds
│ ├── a_boat_in_a_frozen_river.png
│ ├── A_cabin_surrounded_by_snowy_forests.png
│ ├── A_cat_sitting_on_grass.png
│ ├── A_church_beside_a_waterfall.png
│ ├── A_futuristic_car_wiht_neon_signs_on_the_road.png
│ ├── A_hotdog_on_the_table.png
│ ├── An_ancient_villa_close_to_the_pool.png
│ ├── A_red_sportscar_driving_on_a_desert_road.png
│ ├── a_squirrel_sitting_on_a_table.png
│ ├── A_toy_lego_castle_close_to_the_pool.png
│ └── eval.json
└── sds
...
├── A_villa_close_to_the_pool.png
└── eval.json
You will receive a zero score if:
- you do not submit,
- your code is not executable in the Python environment we provided, or
- you modify anycode outside of the section marked with
TODO
or use different hyperparameters that are supposed to be fixed as given.
Your score will incur a 10% deduction for each missing item in the submission item list.
Task 1 and Task 2 are worth 10 points each, while Task 3 (Optional) is worth 5 points.
CLIP Score | Points (Optional Task) |
---|---|
0.28 ⬆️ | 10 (5) |
0.26 ⬆️ | 5 (2.5) |
0.26 ⬇️ | 0 (0) |
This assignment is heavily based on DreamFusion and PDS. You may refer to the repository while working on the tasks below. However, it is strictly forbidden to simply copy, reformat, or refactor the necessary codeblocks when making your submission. You must implement the functionalities on your own with clear understanding of how your code works. As noted in the course website, we will detect such cases with a specialized tool and plagiarism in any form will result in a zero score.
If you are interested in this topic, we encourage you to check ou the materials below.