Skip to content

UknowSth/CineTrans

Repository files navigation

CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models

This repository contains the official PyTorch implementation of CineTrans, a novel framework for generating videos with controllable cinematic transitions via masked diffusion models.

Paper Project Page

🎥 Demo

teaser_video_compressed_.mp4

📥 Installation

  1. Clone the Repository
git clone https://github.com/UknowSth/CineTrans.git
cd CineTrans
  1. Set up Environment
conda create -n cinetrans python==3.11.9
conda activate cinetrans

pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

🤗 Checkpoint

CineTrans-Unet

Download the required model weights and place them in the ckpt/ directory.

ckpt/
│── stable-diffusion-v1-4/
│   ├── scheduler/
│   ├── text_encoder/
│   ├── tokenizer/  
│   │── unet/
│   └── vae_temporal_decoder/
│── checkpoint.pt
│── longclip-L.pt

CineTrans-DiT

Download the weights of Wan2.1-T2V-1.3B and lora weights. Place them as:

Wan2.1-T2V-1.3B/ # original weights
│── google/
│   └── umt5-xxl/
│── config.json
│── diffusion_pytorch_model.safetensors
│── models_t5_umt5-xxl-enc-bf16.pth
│── Wan2.1_VAE.pth
ckpt/
└── weights.pt # lora weights

🖥️ Inference

To run the inference, use the following command:

CineTrans-Unet

python pipelines/sample.py --config configs/sample.yaml

Using a single A100 GPU, generating a single video takes approximately 40s. You can modify the relevant configurations and prompt in configs/sample.yaml to adjust the generation process.

CineTrans-DiT

python generate.py

Using a single A100 GPU, generating a single video takes approximately 5min. You can modify the relevant configurations and prompt in configs/t2v.yaml to adjust the generation process.

🖼️ Gallery

coffee_cup white_flower snow
Shot1:[0s,4s] Shot2:[4s,8s] Shot1:[0s,4s] Shot2:[4s,8s] Shot1:[0s,2.75s] Shot2:[2.75s,5.5s] Shot3:[5.5s,8s]
vintage city_night sea
Shot1:[0s,2.5s] Shot2:[2.5s,5s] Shot3:[5s,8s] Shot1:[0s,2.5s] Shot2:[2.5s,5s] Shot3:[5s,8s] Shot1:[0s,3s] Shot2:[3s,6s] Shot3:[6s,8s]

📑 BiTeX

If you find CineTrans useful for your research and applications, please cite using this BibTeX:

@misc{wu2025cinetranslearninggeneratevideos,
      title={CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models}, 
      author={Xiaoxue Wu and Bingjie Gao and Yu Qiao and Yaohui Wang and Xinyuan Chen},
      year={2025},
      eprint={2508.11484},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.11484}, 
}

About

CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages