This repository is the implementation of
[TMLR 2024] UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control
- Authors: Tian Xia*, Xuweiyi Chen*, Sihan Xu**
- Affiliation: University of Michigan, University of Virginia, PixAI.art,
- *Equal contribution, **Correspondence
Project page | Paper | Demo
Original | UniCtrl |
Original | UniCtrl |
- Our code about UniCtrl is released and you can checkout our paper as well!
We introduce UniCtrl, a novel, plug-and-play method that is universally applicable to improve the spatiotemporal consistency and motion diversity of videos generated by text-to-video models without additional training. UniCtrl ensures semantic consistency across different frames through cross-frame self-attention control, and meanwhile, enhances the motion quality and spatiotemporal consistency through motion injection and spatiotemporal synchronization.
git clone https://github.com/XuweiyiChen/UniCtrl.git
cd UniCtrl
cd examples/AnimateDiff
conda env create -f environment.yaml
conda activate animatediff_pt2
Please refer to the official repo of AnimateDiff for the full setup guide. The setup guide is listed here.
Quickstart guide
git lfs install
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 models/StableDiffusion/
bash download_bashscripts/0-MotionModule.sh
bash download_bashscripts/5-RealisticVision.sh
We provide a Gradio Demo to demonstrate our method with UI.
python app.py
Alternatively, you can try the online demo hosted on Hugging Face: [demo link].
If you find our repo useful for your research, please consider citing our paper:
@misc{chen2024unictrl,
title={UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control},
author={Xuweiyi Chen and Tian Xia and Sihan Xu},
year={2024},
eprint={2403.02332},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
This project is distributed under the MIT License. See LICENSE
for more information.
The example code is built upon AnimateDiff and FreeInit. Thanks to the team for their impressive work!