FiVE-Bench (ICCV 2025)

FiVE-Bench: A Fine-Grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models

Minghan Li^1*, Chenxi Xie^2*, Yichen Wu¹³, Lei Zhang², Mengyu Wang^1†
¹Harvard University ²The Hong Kong Polytechnic University ³City University of Hong Kong
^*Equal contribution ^†Corresponding Author

💜 Leaderboard | 💻 GitHub | 🤗 Hugging Face

📝 Project Page | 📰 Paper | 🎥 Video Demo

Follow-up Works

DNAEdit (NeurIPS25 SpotLight) Direct Noise Alignment for Text-Guided Rectified Flow Editing
SplitFlow (NeurIPS25) Flow Decomposition for Inversion-Free Text-to-Image Editing
DVRF (CVPR26) Delta Velocity Rectified Flow for Text-to-Image Editing

📝 TODO List

[🔜] Add Wan-Edit demo page on HF
[✅ Oct-30-2025] Add leaderboard support 🔥🔥🔥🔥🔥
[✅ Oct-30-2025] Reorganized original results following Wan-Edit naming, kept only MP4s, Google Drive. Thanks @Kunlin Yang. 🔥🔥🔥🔥🔥
[✅ Oct-28-2025] The original results of all comparison methods reported in the paper have been released for reference. 🔥🔥🔥🔥🔥
[✅ Aug-26-2025] Fix two issues: mp4_to_frames_ffmpeg and skip_timestep=17. Raw quantitative results of `Wan-Edit' is included.
[✅ Aug-05-2025] Release `Wan-Edit' implementation
[✅ Aug-05-2025] Release Pyramid-Edit implementation
[✅ Aug-02-2025] Add Wan-Edit results to HF for eval demo
[✅ Aug-02-2025] Evaluation code released
[✅ Mar-31-2025] Dataset uploaded to Hugging Face

Human Evaluation Example via Netlify Link1 Link2

🚀 Submit Your Results

We welcome contributions! If you’ve evaluated your method on FiVE-Bench, please share your results so we can include them in the leaderboard. You can submit via a GitHub Issue or Pull Request following the leaderboard format.

📩 For large files or additional details, feel free to contact us directly.

📚 Table of Contents

FiVE-Bench Overview
Running Your Model on FiVE-Bench
Evaluate Editing Results
- Conventional Metrics
- FiVE-Acc: VLM-Based Metric
Citation
Acknowledgement

📦 FiVE-Bench Overview

The FiVE-Bench dataset offers a rich, structured benchmark for fine-grained video editing. The dataset includes 420 high-quality source-target prompt pairs spanning six fine-grained video editing tasks:

Object Replacement (Rigid)
Object Replacement (Non-Rigid)
Color Alteration
Material Modification
Object Addition
Object Removal

Running Your Model on FiVE-Bench

⬇️ Step 1: Download the Dataset and Set Up Evaluation Code

Download the dataset from Hugging Face: 🔗 FiVE-Bench on Hugging Face
Follow the instructions in Installation Guide to download the dataset and install the evaluation code (FiVE_Bench).

Place the downloaded dataset in the directory: ./FiVE_Bench/data. The data structure should looks like:

📁 /path/to/code/FiVE_Bench/data
├── 📁 assets/
├── 📁 edit_prompt/
│   ├── 📄 edit1_FiVE.json
│   ├── 📄 edit2_FiVE.json
│   ├── 📄 edit3_FiVE.json
│   ├── 📄 edit4_FiVE.json
│   ├── 📄 edit5_FiVE.json
│   └── 📄 edit6_FiVE.json
├── 📄 README.md
├── 📦 bmasks.zip 
├── 📁 bmasks 
│   ├── 📁 0001_bus
│       ├── 🖼️ 00001.jpg
│       ├── 🖼️ 00002.jpg
│       ├── 🖼️ ...
│   ├── 📁 ...
├── 📦 images.zip 
├── 📁 images
│   ├── 📁 0001_bus
│       ├── 🖼️ 00001.jpg
│       ├── 🖼️ 00002.jpg
│       ├── 🖼️ ...
│   ├── 📁 ...
├── 📦 videos.zip 
├── 📁 videos
│   ├── 🎞️ 0001_bus.mp4
│   ├── 🎞️ 0002_girl-dog.mp4
│   ├── 🎞️ ...

🛠️ Step 2: Apply Your Video Editing Method

Use your video editing method to edit the FiVE-Bench videos based on the provided text prompts and generate the corresponding edited results.

Example implementations of our proposed rectified flow (RF)-based video editing methods are provided provided in the models/ directory:

- **[Pyramid-Edit](models/README.md#pyramid-edit)**: Diffusion-based video editing using Pyramid-Flow architecture

- **[Wan-Edit](models/README.md#wan-edit)**: Rectified flow-based video editing with Wan2.1-T2V-1.3B model

Quick Start with Provided Models

Run Pyramid-Edit:

# Setup model
cd models/pyramid-edit && mkdir -p hf/pyramid-flow-miniflux
# Download model checkpoint to hf/ directory
bash scripts/run_FiVE.sh

Run Wan-Edit:

# Setup model  
cd models/wan-edit && mkdir -p hf/Wan2.1-T2V-1.3B
# Download model checkpoint to hf/ directory
bash scripts/run_FiVE.sh

For detailed setup instructions and configuration options, see the Models Documentation.

📊 Step 3: Evaluate Editing Results

Follow the installation guide in Installation Guide to get the evaluation results.

sh scripts/eval_FiVE.sh

Evaluation Support Elements:

Editing Masks: Generated using SAM2 to assist in localized metric evaluation.
Editing Instructions: Structured directives for each source-target pair to guide model behavior.

FiVE-Bench provides comprehensive evaluation through two major components:

📐 1. Conventional Metrics (Across Six Key Aspects)

These metrics quantitatively measure various dimensions of video editing quality:

Structure Preservation
Background Preservation
(PSNR, LPIPS, MSE, SSIM outside the editing mask)
Edit Prompt–Image Consistency
(CLIP similarity on full and masked images)
Image Quality Assessment
(NIQE)
Temporal Consistency
(MFS: Motion Fidelity Score):
Runtime Efficiency

🤖 2. FiVE-Acc: A VLM-based Metric for Editing Success

We use a vision-language model (VLM) to automatically assess whether the intended edits are reflected in the video outputs by asking it questions about the content. If the source video contains a swan, and the target prompt requests a flamingo. For the edited video, we ask

Yes/No Questions:
- Is there a swan in the video?
- Is there a flamingo in the video?
✅ The edit is considered successful only if the answers are "No" to the first question and "Yes" to the second.
Multiple-choice Questions:
- What is in the video? a) A swan b) A flamingo
✅ The edit is considered successful if the model selects the correct target object (e.g., b) A flamingo) and avoids selecting the original source object.

FiVE-Acc evaluates editing success using a vision-language model (VLM) by asking content-related questions:

YN-Acc: Yes/No question accuracy
MC-Acc: Multiple-choice question accuracy
U-Acc: Union accuracy – success if any question is correct
∩-Acc: Intersection accuracy – success only if all questions are correct
FiVE-Acc ↑: Final score = average of all above metrics (higher is better)

📚 Citation

If you use FiVE-Bench in your research, please cite us:

@article{li2025five,
  title={Five: A fine-grained video editing benchmark for evaluating emerging diffusion and rectified flow models},
  author={Li, Minghan and Xie, Chenxi and Wu, Yichen and Zhang, Lei and Wang, Mengyu},
  journal={arXiv preprint arXiv:2503.13684},
  year={2025}
}

Recommended our recent papers on image/video editing:

@article{xie2025dnaedit,
  title={DNAEdit: Direct Noise Alignment for Text-Guided Rectified Flow Editing},
  author={Xie, Chenxi and Li, Minghan and Li, Shuai and Wu, Yuhui and Yi, Qiaosi and Zhang, Lei},
  journal={arXiv preprint arXiv:2506.01430},
  year={2025}  # NeurIPS 2025
}

@article{beaudouin2025delta,
  title={Delta Velocity Rectified Flow for Text-to-Image Editing},
  author={Beaudouin, Gaspard and Li, Minghan and Kim, Jaeyeon and Yoon, Sung-Hoon and Wang, Mengyu},
  journal={arXiv preprint arXiv:2509.05342},
  year={2025}
}

❤️ Acknowledgement

Part of the code is adapted from PIE-Bench, FlowEdit (ICCV25 Best Student Paper), Pyramid-Flow and Wan model.
We thank the authors for their excellent work and for making their code publicly available.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FiVE-Bench (ICCV 2025)

Follow-up Works

📝 TODO List

Human Evaluation Example via Netlify Link1 Link2

🚀 Submit Your Results

📚 Table of Contents

📦 FiVE-Bench Overview

Running Your Model on FiVE-Bench

⬇️ Step 1: Download the Dataset and Set Up Evaluation Code

🛠️ Step 2: Apply Your Video Editing Method

Quick Start with Provided Models

📊 Step 3: Evaluate Editing Results

📐 1. Conventional Metrics (Across Six Key Aspects)

🤖 2. FiVE-Acc: A VLM-based Metric for Editing Success

📚 Citation

❤️ Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
evaluation		evaluation
files		files
models		models
results		results
scripts		scripts
.gitignore		.gitignore
INSTALL.md		INSTALL.md
README.md		README.md
config.yaml		config.yaml
data		data
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

FiVE-Bench (ICCV 2025)

Follow-up Works

📝 TODO List

Human Evaluation Example via Netlify Link1 Link2

🚀 Submit Your Results

📚 Table of Contents

📦 FiVE-Bench Overview

Running Your Model on FiVE-Bench

⬇️ Step 1: Download the Dataset and Set Up Evaluation Code

🛠️ Step 2: Apply Your Video Editing Method

Quick Start with Provided Models

📊 Step 3: Evaluate Editing Results

📐 1. Conventional Metrics (Across Six Key Aspects)

🤖 2. FiVE-Acc: A VLM-based Metric for Editing Success

📚 Citation

❤️ Acknowledgement

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages