UniVid: The Open-Source Unified Video Model

This is the official repository for the paper:

UniVid: The Open-Source Unified Video Model

Jiabin Luo*, Junhui Lin*, Zeyu Zhang*^†, Biao Wu*, Meng Fang, Ling Chen, and Hao Tang^‡

*Equal contribution. ^†Project lead. ^‡Corresponding author.

Paper | Website | Models | HF Paper

output2.mp4

✏️ Citation

If you find our code or paper helpful, please consider starring ⭐ us and citing:

@article{luo2025univid,
  title={UniVid: The Open-Source Unified Video Model},
  author={Luo, Jiabin and Lin, Junhui and Zhang, Zeyu and Wu, Biao and Fang, Meng and Chen, Ling and Tang, Hao},
  journal={arXiv preprint arXiv:2509.24200},
  year={2025}
}

TODO List

⬜️ Upload our paper to arXiv and build project pages.
⬜️ Upload the code.

🏃 Intro UniVid

UniVid is an open-source model that enhances both video generation and video understanding.

Unified video modeling combining generation and understanding capabilities is increasingly important, yet faces two key challenges: maintaining semantic faithfulness during flow-based generation due to text-visual token imbalance and the suboptimality of uniform cross-modal attention across the flow trajectory, and efficiently extending image-centric MLLMs to video without costly retraining. We present UniVid, a unified architecture that couples an MLLM with a diffusion decoder through a lightweight adapter, enabling both video understanding and generation. We introduce Temperature Modality Alignment to improve prompt adherence and Pyramid Reflection for efficient temporal reasoning via dynamic keyframe selection. Extensive experiments on standard benchmarks demonstrate the state-of-the-art performance of our unified video model, achieving a 2.2% improvement on VBench-Long total score compared to the previous SOTA method EasyAnimateV5.1, and 1.0% and 3.3% accuracy gains on MSVD-QA and ActivityNet-QA, respectively, compared with the best prior 7B baselines.

🔧Run Your UniVid

1. Install & Requirements

conda env create -f environment.yaml
conda activate univid

2. Understanding

Runs the Reflection pipeline on a subset of videos and saves results + traces.

Input format：

video_dir: contains files like video{video_id}.mp4
gt_file: JSON list with at least video_id, question, answer (optional id)

[{"video_id": 1203, "question": "What color is the car?", "answer": "Red"}]

Quick Start

python eval_understanding.py \
  --video_dir /path/to/videos \
  --gt_file /path/to/gt.json \
  --output_dir /path/to/out \
  --output_name subset_run \
  --model_path /path/to/MODEL_DIR \
  --no_ddp_ranker \
  --siglip_ckpt google/siglip2-base-patch16-naflex

Outputs

Batch summary: /path/to/out/subset_run.json(fields: id, video_id, question, answer, pred, trace_path)
Per-sample trace JSONs: /path/to/out/video{video_id}_reflexion.json
Keyframes (if enabled): sample_frames/video{video_id}/...

Note：

If all three rounds fail:

Static: fallback uses global-caption answer; if insufficient, use the last round.
Dynamic: fallback uses global-caption answer; if insufficient, use the first round.

For DDP frame ranking, omit --no_ddp_ranker and add --ddp_ranker clip_rank_video_ddp.py --nproc 4.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
assets		assets
models		models
README.md		README.md
environment.yaml		environment.yaml
inference.py		inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UniVid: The Open-Source Unified Video Model

Paper | Website | Models | HF Paper

✏️ Citation

TODO List

🏃 Intro UniVid

🔧Run Your UniVid

1. Install & Requirements

2. Understanding

Input format：

Quick Start

Outputs

Note：

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

AIGeeksGroup/UniVid

Folders and files

Latest commit

History

Repository files navigation

UniVid: The Open-Source Unified Video Model

Paper | Website | Models | HF Paper

✏️ Citation

TODO List

🏃 Intro UniVid

🔧Run Your UniVid

1. Install & Requirements

2. Understanding

Input format：

Quick Start

Outputs

Note：

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages