Skip to content

wenhao728/VORTA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VORTA: Efficient Video Diffusion via Routing Sparse Attention

Tip

NeurIPS '25 VORTA accelerates video diffusion transformers by sparse attention and dynamic routing, achieving up to 14.4× speedup with negligible quality loss.

🔧 Setup

Install Pytorch, we have tested the code with PyTorch 2.6.0 and CUDA 12.6. But it should work with other versions as well. You can install PyTorch using the following command:

pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu126

Install the dependencies:

python -m pip install -r requirements.txt

🚀 Quick Start

We use the genaral scripts to demonstrate the usage of our method. You can find the detailed scripts for each model in the scripts folder:

Run the baseline model sampling without acceleration:

CUDA_VISIBLE_DEVICES=0 python scripts/<model_name>/inference.py \
    --pretrained_model_path <model_name_on_hf> \
    --pretrained_model_path $pretrained_model_path \
    --val_data_json_file prompt.json \
    --output_dir results/<model_name>/baseline \
    --native_attention \
    --enable_cpu_offload \
    --seed 1234

Download the ready-to-use router weights from huggingface models.

git lfs install
git clone git@hf.co:Wenhao-Sun/VORTA
# mv VORTA/<model_name> results/, <model_name>: wan-14B, hunyuan; e.g.
mv VORTA/wan-14B results/

Run the video DiTs with VORTA for acceleration:

CUDA_VISIBLE_DEVICES=0 python scripts/<model_name>/inference.py \
    --pretrained_model_path <model_name_on_hf> \
    --pretrained_model_path $pretrained_model_path \
    --val_data_json_file prompt.json \
-    --output_dir results/<model_name>/baseline \
+    --output_dir results/<model_name>/vorta \
-    --native_attention \
+    --resume_dir results/<model_name>/train \
+    --resume ckpt/step-000100 \
    --enable_cpu_offload \
    --seed 1234
  • You can edit the prompts.json or the --val_data_json_file option to change the text prompt.
  • See the source code scripts/<model_name>/inference.py or use python scripts/<model_name>/inference.py --help command for more detailed explanations of the arguments.

📜 Citation

If you find our work useful in your research, please consider citing:

@article{DBLP:journals/corr/abs-2505-18809,
  author       = {Wenhao Sun and
                  Rong{-}Cheng Tu and
                  Yifu Ding and
                  Zhao Jin and
                  Jingyi Liao and
                  Shunyu Liu and
                  Dacheng Tao},
  title        = {{VORTA:} Efficient Video Diffusion via Routing Sparse Attention},
  journal      = {CoRR},
  volume       = {abs/2505.18809},
  year         = {2025}
}

♥️ Shout-out

Thanks to the authors of the following repositories for their great works and open-sourcing the code and models: Diffusers, HunyuanVideo, Wan 2.1, FastVideo

About

The code implementation of paper "VORTA: Efficient Video Diffusion via Routing Sparse Attention"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published