This respository contains the code for the CVPR 2023 paper SINE: SINgle Image Editing with Text-to-Image Diffusion Models. For more visualization results, please check our webpage.
SINE: SINgle Image Editing with Text-to-Image Diffusion Models
Zhixing Zhang 1, Ligong Han 1, Arnab Ghosh 2, Dimitris Metaxas 1, and Jian Ren 2
1 Rutgers University 2 Snap Inc.
CVPR 2023.
First, clone the repository and install the dependencies:
git clone git@github.com:zhang-zx/SINE.git
Then, install the dependencies following the instructions.
Alternatively, you can also try to use the following docker image.
docker pull sunggukcha/sine
To fine-tune the model, you need to download the pre-trained model.
The data we use in the paper can be found from here.
IMG_PATH=path/to/image
CLS_WRD='coarse class word'
NAME='name of the experiment'
python main.py \
--base configs/stable-diffusion/v1-finetune_picture.yaml \
-t --actual_resume /path/to/pre-trained/model \
-n $NAME --gpus 0, --logdir ./logs \
--data_root $IMG_PATH \
--reg_data_root $IMG_PATH --class_word $CLS_WRD
IMG_PATH=path/to/image
CLS_WRD='coarse class word'
NAME='name of the experiment'
python main.py \
--base configs/stable-diffusion/v1-finetune_patch_picture.yaml \
-t --actual_resume /path/to/pre-trained/model \
-n $NAME --gpus 0, --logdir ./logs \
--data_root $IMG_PATH \
--reg_data_root $IMG_PATH --class_word $CLS_WRD
LOG_DIR=/path/to/logdir
python scripts/stable_txt2img_guidance.py --ddim_eta 0.0 --n_iter 1 \
--scale 10 --ddim_steps 100 \
--sin_config configs/stable-diffusion/v1-inference.yaml \
--sin_ckpt $LOG_DIR"/checkpoints/last.ckpt" \
--prompt "prompt for pre-trained model[SEP]prompt for fine-tuned model" \
--cond_beta 0.4 \
--range_t_min 500 --range_t_max 1000 --single_guidance \
--skip_save --H 512 --W 512 --n_samples 2 \
--outdir $LOG_DIR
python scripts/stable_txt2img_multi_guidance.py --ddim_eta 0.0 --n_iter 2 \
--scale 10 --ddim_steps 100 \
--sin_ckpt path/to/ckpt1 path/to/ckpt2 \
--sin_config ./configs/stable-diffusion/v1-inference.yaml \
configs/stable-diffusion/v1-inference.yaml \
--prompt "prompt for pre-trained model[SEP]prompt for fine-tuned model1[SEP]prompt for fine-tuned model2" \
--beta 0.4 0.5 \
--range_t_min 400 400 --range_t_max 1000 1000 --single_guidance \
--H 512 --W 512 --n_samples 2 \
--outdir path/to/output_dir
The Diffusers Library support is still under development. Results in our paper are obtained using previous code based on LDM.
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export IMG_PATH="path/to/image"
export OUTPUT_DIR="path/to/output_dir"
accelerate launch diffusers_train.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_text_encoder \
--img_path=$IMG_PATH \
--output_dir=$OUTPUT_DIR \
--instance_prompt="prompt for fine-tuning" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--learning_rate=1e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=NUMBERS_OF_STEPS \
--checkpointing_steps=FREQUENCY_FOR_CHECKPOINTING \
--patch_based_training # OPTIONAL: add this flag for patch-based training scheme
python diffusers_sample.py \
--pretrained_model_name_or_path "path/to/output_dir" \
--prompt "prompt for fine-tuned model" \
--editing_prompt 'prompt for pre-trained model'
Some of the editing results are shown below. See more results on our webpage.
In this code we refer to the following implementations: Dreambooth-Stable-Diffusion and stable-diffusion. Implementation with the Diffusers Library support is highly based on Dreambooth. Great thanks to them!
If our work or code helps you, please consider to cite our paper. Thank you!
@article{zhang2022sine,
title={SINE: SINgle Image Editing with Text-to-Image Diffusion Models},
author={Zhang, Zhixing and Han, Ligong and Ghosh, Arnab and Metaxas, Dimitris and Ren, Jian},
journal={arXiv preprint arXiv:2212.04489},
year={2022}
}