This repository contains the official PyTorch implementation of CineTrans, a novel framework for generating videos with controllable cinematic transitions via masked diffusion models.
teaser_video_compressed_.mp4
- Clone the Repository
git clone https://github.com/UknowSth/CineTrans.git
cd CineTrans
- Set up Environment
conda create -n cinetrans python==3.11.9
conda activate cinetrans
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
Download the required model weights and place them in the ckpt/
directory.
ckpt/
│── stable-diffusion-v1-4/
│ ├── scheduler/
│ ├── text_encoder/
│ ├── tokenizer/
│ │── unet/
│ └── vae_temporal_decoder/
│── checkpoint.pt
│── longclip-L.pt
Download the weights of Wan2.1-T2V-1.3B and lora weights. Place them as:
Wan2.1-T2V-1.3B/ # original weights
│── google/
│ └── umt5-xxl/
│── config.json
│── diffusion_pytorch_model.safetensors
│── models_t5_umt5-xxl-enc-bf16.pth
│── Wan2.1_VAE.pth
ckpt/
└── weights.pt # lora weights
To run the inference, use the following command:
python pipelines/sample.py --config configs/sample.yaml
Using a single A100 GPU, generating a single video takes approximately 40s. You can modify the relevant configurations and prompt in configs/sample.yaml
to adjust the generation process.
python generate.py
Using a single A100 GPU, generating a single video takes approximately 5min. You can modify the relevant configurations and prompt in configs/t2v.yaml
to adjust the generation process.
If you find CineTrans useful for your research and applications, please cite using this BibTeX:
@misc{wu2025cinetranslearninggeneratevideos,
title={CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models},
author={Xiaoxue Wu and Bingjie Gao and Yu Qiao and Yaohui Wang and Xinyuan Chen},
year={2025},
eprint={2508.11484},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.11484},
}