- Release inference code
- Release pretrained models
- Release training code
- Hugging Face demo
Create a conda environment vico
using
conda env create -f environment.yaml
conda activate vico
Download the pretrained stable diffusion v1-4 under models/ldm/stable-diffusion-v1
.
We provide the pretrained checkpoints at 300. 350, and 400 steps of 8 objects. You can download the sample images and their corresponding pretrained checkpoints. You can also download the data of any object:
Object | Sample images | Checkpoints |
---|---|---|
barn | image | ckpt |
batman | image | ckpt |
clock | image | ckpt |
dog7 | image | ckpt |
monster toy | image | ckpt |
pink sunglasses | image | ckpt |
teddybear | image | ckpt |
wooden pot | image | ckpt |
Before run the inference command, please set:
REF_IMAGE_PATH
: Path of the reference image. It can be any image in the samples likebatman/1.jpg
.CHECKPOINT_PATH
: Path of the checkpoint weight. Its subfolder should be similar tocheckpoints/*-399.pt
.OUTPUT_PATH
: Path of the generated images. For example, it can be likeoutputs/batman
.
python scripts/vico_txt2img.py \
--ddim_eta 0.0 --n_samples 4 --n_iter 2 --scale 7.5 --ddim_steps 50 \
--ckpt_path models/ldm/stable-diffusion-v1/sd-v1-4.ckpt \
--image_path REF_IMAGE_PATH \
--ft_path CHECKPOINT_PATH \
--load_step 399 \
--prompt "a photo of * on the beach" \
--outdir OUTPUT_PATH
You can specify load_step
(300,350,400) and personalize prompt
(a prefix "a photo of" usually makes better results).
If you use this code in your research, please consider citing our paper:
@inproceedings{Hao2023ViCo,
title={ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation},
author={Shaozhe Hao and Kai Han and Shihao Zhao and Kwan-Yee K. Wong},
year={2023}
}