VORTA: Efficient Video Diffusion via Routing Sparse Attention

Tip

NeurIPS '25 VORTA accelerates video diffusion transformers by sparse attention and dynamic routing, achieving up to 14.4× speedup with negligible quality loss.

🔧 Setup

Install Pytorch, we have tested the code with PyTorch 2.6.0 and CUDA 12.6. But it should work with other versions as well. You can install PyTorch using the following command:

pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu126

Install the dependencies:

python -m pip install -r requirements.txt

🚀 Quick Start

We use the genaral scripts to demonstrate the usage of our method. You can find the detailed scripts for each model in the scripts folder:

HunyuanVideo: scripts/hunyuan/inference.sh
Wan 2.1: scripts/wan/inference.sh

Run the baseline model sampling without acceleration:

CUDA_VISIBLE_DEVICES=0 python scripts/<model_name>/inference.py \
    --pretrained_model_path <model_name_on_hf> \
    --pretrained_model_path $pretrained_model_path \
    --val_data_json_file prompt.json \
    --output_dir results/<model_name>/baseline \
    --native_attention \
    --enable_cpu_offload \
    --seed 1234

Download the ready-to-use router weights from huggingface models.

git lfs install
git clone git@hf.co:Wenhao-Sun/VORTA
# mv VORTA/<model_name> results/, <model_name>: wan-14B, hunyuan; e.g.
mv VORTA/wan-14B results/

Run the video DiTs with VORTA for acceleration:

CUDA_VISIBLE_DEVICES=0 python scripts/<model_name>/inference.py \
    --pretrained_model_path <model_name_on_hf> \
    --pretrained_model_path $pretrained_model_path \
    --val_data_json_file prompt.json \
-    --output_dir results/<model_name>/baseline \
+    --output_dir results/<model_name>/vorta \
-    --native_attention \
+    --resume_dir results/<model_name>/train \
+    --resume ckpt/step-000100 \
    --enable_cpu_offload \
    --seed 1234

You can edit the prompts.json or the --val_data_json_file option to change the text prompt.

See the source code scripts/<model_name>/inference.py or use python scripts/<model_name>/inference.py --help command for more detailed explanations of the arguments.

📜 Citation

If you find our work useful in your research, please consider citing:

@article{DBLP:journals/corr/abs-2505-18809,
  author       = {Wenhao Sun and
                  Rong{-}Cheng Tu and
                  Yifu Ding and
                  Zhao Jin and
                  Jingyi Liao and
                  Shunyu Liu and
                  Dacheng Tao},
  title        = {{VORTA:} Efficient Video Diffusion via Routing Sparse Attention},
  journal      = {CoRR},
  volume       = {abs/2505.18809},
  year         = {2025}
}

♥️ Shout-out

Thanks to the authors of the following repositories for their great works and open-sourcing the code and models: Diffusers, HunyuanVideo, Wan 2.1, FastVideo

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
results		results
scripts		scripts
vorta		vorta
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
prompt.json		prompt.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VORTA: Efficient Video Diffusion via Routing Sparse Attention

🔧 Setup

🚀 Quick Start

📜 Citation

♥️ Shout-out

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

wenhao728/VORTA

Folders and files

Latest commit

History

Repository files navigation

VORTA: Efficient Video Diffusion via Routing Sparse Attention

🔧 Setup

🚀 Quick Start

📜 Citation

♥️ Shout-out

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages