Tip
NeurIPS '25 VORTA accelerates video diffusion transformers by sparse attention and dynamic routing, achieving up to 14.4× speedup with negligible quality loss.
Install Pytorch, we have tested the code with PyTorch 2.6.0 and CUDA 12.6. But it should work with other versions as well. You can install PyTorch using the following command:
pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu126
Install the dependencies:
python -m pip install -r requirements.txt
We use the genaral scripts to demonstrate the usage of our method. You can find the detailed scripts for each model in the scripts folder:
- HunyuanVideo: scripts/hunyuan/inference.sh
- Wan 2.1: scripts/wan/inference.sh
Run the baseline model sampling without acceleration:
CUDA_VISIBLE_DEVICES=0 python scripts/<model_name>/inference.py \
--pretrained_model_path <model_name_on_hf> \
--pretrained_model_path $pretrained_model_path \
--val_data_json_file prompt.json \
--output_dir results/<model_name>/baseline \
--native_attention \
--enable_cpu_offload \
--seed 1234Download the ready-to-use router weights from huggingface models.
git lfs install
git clone git@hf.co:Wenhao-Sun/VORTA
# mv VORTA/<model_name> results/, <model_name>: wan-14B, hunyuan; e.g.
mv VORTA/wan-14B results/Run the video DiTs with VORTA for acceleration:
CUDA_VISIBLE_DEVICES=0 python scripts/<model_name>/inference.py \
--pretrained_model_path <model_name_on_hf> \
--pretrained_model_path $pretrained_model_path \
--val_data_json_file prompt.json \
- --output_dir results/<model_name>/baseline \
+ --output_dir results/<model_name>/vorta \
- --native_attention \
+ --resume_dir results/<model_name>/train \
+ --resume ckpt/step-000100 \
--enable_cpu_offload \
--seed 1234
- You can edit the
prompts.jsonor the--val_data_json_fileoption to change the text prompt.- See the source code
scripts/<model_name>/inference.pyor usepython scripts/<model_name>/inference.py --helpcommand for more detailed explanations of the arguments.
If you find our work useful in your research, please consider citing:
@article{DBLP:journals/corr/abs-2505-18809,
author = {Wenhao Sun and
Rong{-}Cheng Tu and
Yifu Ding and
Zhao Jin and
Jingyi Liao and
Shunyu Liu and
Dacheng Tao},
title = {{VORTA:} Efficient Video Diffusion via Routing Sparse Attention},
journal = {CoRR},
volume = {abs/2505.18809},
year = {2025}
}Thanks to the authors of the following repositories for their great works and open-sourcing the code and models: Diffusers, HunyuanVideo, Wan 2.1, FastVideo