Qihua Chen*, Yue Ma*, Hongfa Wang*, Junkun yuan*✉️,
Wenzhe Zhao, Qi Tian, Hongmei Wang,Shaobo Min, Qifeng Chen, and Wei Liu✉️
- [2024.09.18] 🔥 Release
training/inference code
,config
andcheckpoints
! - [2024.09.07] 🔥 Release Paper and Project page!
Follow-Your-Canvas enables higher-resolution video outpainting with rich content generation, overcoming GPU memory constraints and maintaining spatial-temporal consistency.
Before running the code, make sure you have setup the environment and installed the required packages. Since the outpainting window is 51251264 each time, you need a GPU with at least 60G memory for both training and inference.
pip install -r requirements.txt
Download our checkpoints here.
You also need to download [sam_vit_b_01ec64], [stable-diffusion-2-1], and [Qwen-VL-Chat].
Finally, these pretrained models should be organized as follows:
pretrained_models
├── sam
│ └── sam_vit_b_01ec64.pth
├── follow-your-canvas
│ └── checkpoint-40000.ckpt
├── stable-diffusion-2-1
└── Qwen-VL-Chat
We also provide the training code for Follow-Your-Canvas. In our implementation, eight NVIDIA A800 GPUs are used for training (50K steps). First, you should download the Panda-70M dataset. Our dataset (animatediff/dataset.py) needs a csv which contains the video file names and prompt.
# config the csv path and video path in train_outpainting-SAM.yaml
torchrun --nnodes=1 --nproc_per_node=8 --master_port=8888 train.py --config train_outpainting-SAM.yaml
We support outpaint with and without prompt (where the prompt will be generated by Qwen).
# outpaint the video in demo_video/panda to 2k with prompt 'a panda sitting on a grassy area in a lake, with forest mountain in the background'.
python3 inference_outpainting-dir.py --config infer-configs/infer-9-16.yaml
# outpaint the video in demo_video/polar to 2k without prompt.
python3 inference_outpainting-dir-with-prompt.py --config infer-configs/prompt-panda.yaml
The result will be saved in /infer.
We evaluate our Follow-Your-Canvas on the DAVIS 2017 dataset. Here we provide the input for each experimental settings, gt videos and our outpainting results. The code for PSNR, SSIM, LPIPS, and FVD metics is in /video_metics/demo.py and fvd2.py. To compute aesthetic quality (AQ) and imaging quality (IQ) from V-Bench:
cd video_metics
git clone https://github.com/Vchitect/VBench.git
pip install -r VBench/requirements.txt
pip install VBench
# change the video dir in evaluate-quality.sh
bash evaluate-quality.sh
Follow-Your-Pose: Pose-Guided text-to-Video Generation.
Follow-Your-Click: Open-domain Regional image animation via Short Prompts.
Follow-Your-Handle: Controllable Video Editing via Control Handle Transformations.
Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation.
Follow-Your-Canvas: High-resolution video outpainting with rich content generation.
We acknowledge the following open source projects.
If you find Follow-Your-Canvas useful for your research, welcome to 🌟 this repo and cite our work using the following BibTeX: