CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models

This repository contains the official PyTorch implementation of CineTrans, a novel framework for generating videos with controllable cinematic transitions via masked diffusion models.

🎥 Demo

teaser_video_compressed_.mp4

📥 Installation

Clone the Repository

git clone https://github.com/UknowSth/CineTrans.git
cd CineTrans

Set up Environment

conda create -n cinetrans python==3.11.9
conda activate cinetrans

pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

🤗 Checkpoint

CineTrans-Unet

Download the required model weights and place them in the ckpt/ directory.

ckpt/
│── stable-diffusion-v1-4/
│   ├── scheduler/
│   ├── text_encoder/
│   ├── tokenizer/  
│   │── unet/
│   └── vae_temporal_decoder/
│── checkpoint.pt
│── longclip-L.pt

CineTrans-DiT

Download the weights of Wan2.1-T2V-1.3B and lora weights. Place them as:

Wan2.1-T2V-1.3B/ # original weights
│── google/
│   └── umt5-xxl/
│── config.json
│── diffusion_pytorch_model.safetensors
│── models_t5_umt5-xxl-enc-bf16.pth
│── Wan2.1_VAE.pth
ckpt/
└── weights.pt # lora weights

🖥️ Inference

To run the inference, use the following command:

CineTrans-Unet

python pipelines/sample.py --config configs/sample.yaml

Using a single A100 GPU, generating a single video takes approximately 40s. You can modify the relevant configurations and prompt in configs/sample.yaml to adjust the generation process.

CineTrans-DiT

python generate.py

Using a single A100 GPU, generating a single video takes approximately 5min. You can modify the relevant configurations and prompt in configs/t2v.yaml to adjust the generation process.

🖼️ Gallery


Shot1:[0s,4s] Shot2:[4s,8s]	Shot1:[0s,4s] Shot2:[4s,8s]	Shot1:[0s,2.75s] Shot2:[2.75s,5.5s] Shot3:[5.5s,8s]

Shot1:[0s,2.5s] Shot2:[2.5s,5s] Shot3:[5s,8s]	Shot1:[0s,2.5s] Shot2:[2.5s,5s] Shot3:[5s,8s]	Shot1:[0s,3s] Shot2:[3s,6s] Shot3:[6s,8s]

📑 BiTeX

If you find CineTrans useful for your research and applications, please cite using this BibTeX:

@misc{wu2025cinetranslearninggeneratevideos,
      title={CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models}, 
      author={Xiaoxue Wu and Bingjie Gao and Yu Qiao and Yaohui Wang and Xinyuan Chen},
      year={2025},
      eprint={2508.11484},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.11484}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
configs		configs
diffusion		diffusion
longclip_model		longclip_model
models		models
pipelines		pipelines
sample_videos		sample_videos
wan		wan
.gitignore		.gitignore
README.md		README.md
generate.py		generate.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models

🎥 Demo

📥 Installation

🤗 Checkpoint

CineTrans-Unet

CineTrans-DiT

🖥️ Inference

CineTrans-Unet

CineTrans-DiT

🖼️ Gallery

📑 BiTeX

About

Uh oh!

Releases

Packages

Uh oh!

Languages

UknowSth/CineTrans

Folders and files

Latest commit

History

Repository files navigation

CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models

🎥 Demo

📥 Installation

🤗 Checkpoint

CineTrans-Unet

CineTrans-DiT

🖥️ Inference

CineTrans-Unet

CineTrans-DiT

🖼️ Gallery

📑 BiTeX

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages