Personalized diffusion models have shown remarkable success in Text-to-Image (T2I) generation by enabling the injection of user-defined concepts into diverse contexts. However, balancing concept fidelity with contextual alignment remains a challenging open problem. In this work, we propose an RL-based approach that leverages the diverse outputs of T2I models to address this issue. Our method eliminates the need for human-annotated scores by generating a synthetic paired dataset for DPO-like training using external quality metrics. These better–worse pairs are specifically constructed to improve both concept fidelity and prompt adherence. Moreover, our approach supports flexible adjustment of the trade-off between image fidelity and textual alignment. Through multi-step training, our approach outperforms a naive baseline in convergence speed and output quality. We conduct extensive qualitative and quantitative analysis, demonstrating the effectiveness of our method across various architectures and fine-tuning techniques.
DreamBoothDPO leverages synthetic preference pairs and CLIP-based metrics to automate personalized generation, dynamically optimizing the trade-off between concept accuracy and prompt alignment through iterative multi-stage training.
- [27/05/2025] 🔥🔥🔥 DreamBoothDPO release. Paper has been published on Arxiv.
You need following hardware and python version to run our method.
- Linux
- NVIDIA GPU + CUDA CuDNN
- Conda 24.1.0+ or Python 3.11+
- Clone this repo:
git clone https://github.com/ControlGenAI/DreamBoothDPO.git
cd DreamBoothDPO
- Create Conda environment:
conda create -n dbdpo python=3.11
conda activate dbdpo
- Install the dependencies in your environment:
pip install -r requirements.txt
# 0.0 Download and extract COCO annotations
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip
unzip annotations_trainval2014.zip
# 0.1 Collect prompts from COCO.
python gen_prompts_from_coco.py <args...>
# 0.2 Merge COCO prompts with ChatGPT prompts.
python data/merge.py <args...>
# 1.1 Train base Personalized Generation model (i.e., DreamBooth).
bash scripts/train_dreambooth.sh
# 1.2 Generate images for validational prompts with the base model.
bash scripts/generate_val.sh
# 1.3 Get CLIP scores to find best checkpoint on the Pareto frontier for the base model.
bash scripts/evaluate_exp.sh
# 2.1. Get subsample of prompts
python data/subset.py <args...>
# 2.2 Generate images for collected prompts.
bash scripts/generate_prompts*.sh
# 2.3 Get CLIP scores for generated samples.
bash scripts/evaluate_exp.sh
# 2.4 Collect pairs of generated samples based on score differences and angles.
bash scripts/collect_pairs.sh
# 3 Train DPO on collected pairs.
bash scripts/train_ddpo_pairs*.sh
# 4.1 Generate images for validational prompts with the trained model.
bash scripts/generate_val*.sh
# 4.2 Score samples.
bash scripts/evaluate_exp.sh
# After step 1 and before step 2: Split prompt set for each step.
python data/split_to_parts.py <args...>
# Repeat steps 2 and 3 with the appropriate checkpoints.
bash scripts/pipeline*.sh
Main paper setups are:
- SD2-DB:
scripts/pipeline.sh
- SD2-SVD:
scripts/pipeline_svd_full.sh
- SDXL-SVD:
scripts/pipeline_sdxl.sh
The repository has used several codebases:
- Dreambooth dataset and prompts
- Implementation of DreamBooth, Textual Inversion and LoRA fine-tuning methods from diffusers
- Implementation of SVDDiff
- DiffusionDPO implementation
If you use this code or our findings for your research, please cite our paper:
@misc{ayupov2025dreamboothdpoimprovingpersonalizedgeneration,
title={DreamBoothDPO: Improving Personalized Generation using Direct Preference Optimization},
author={Shamil Ayupov and Maksim Nakhodnov and Anastasia Yaschenko and Andrey Kuznetsov and Aibek Alanov},
year={2025},
eprint={2505.20975},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.20975},
}