FiVE-Bench (ICCV 2025)
Minghan Li1*, Chenxi Xie2*, Yichen Wu13, Lei Zhang2, Mengyu Wang1β
1Harvard University 2The Hong Kong Polytechnic University 3City University of Hong Kong
*Equal contribution β Corresponding Author
π Leaderboard Β | Β π» GitHub Β | Β π€ Hugging Face Β
π Project Page Β | Β π° Paper Β | Β π₯ Video Demo Β
- DNAEdit (NeurIPS25 SpotLight) Direct Noise Alignment for Text-Guided Rectified Flow Editing
- SplitFlow (NeurIPS25) Flow Decomposition for Inversion-Free Text-to-Image Editing
- DVRF (CVPR26) Delta Velocity Rectified Flow for Text-to-Image Editing
- [π] Add
Wan-Editdemo page on HF - [β Oct-30-2025] Add leaderboard support π₯π₯π₯π₯π₯
- [β Oct-30-2025] Reorganized original results following Wan-Edit naming, kept only MP4s, Google Drive. Thanks @Kunlin Yang. π₯π₯π₯π₯π₯
- [β Oct-28-2025] The original results of all comparison methods reported in the paper have been released for reference. π₯π₯π₯π₯π₯
- [β Aug-26-2025] Fix two issues: mp4_to_frames_ffmpeg and skip_timestep=17. Raw quantitative results of `Wan-Edit' is included.
- [β Aug-05-2025] Release `Wan-Edit' implementation
- [β
Aug-05-2025] Release
Pyramid-Editimplementation - [β Aug-02-2025] Add Wan-Edit results to HF for eval demo
- [β Aug-02-2025] Evaluation code released
- [β Mar-31-2025] Dataset uploaded to Hugging Face
We welcome contributions! If youβve evaluated your method on FiVE-Bench, please share your results so we can include them in the leaderboard. You can submit via a GitHub Issue or Pull Request following the leaderboard format.
π© For large files or additional details, feel free to contact us directly.
- FiVE-Bench Overview
- Running Your Model on FiVE-Bench
- Evaluate Editing Results
- Citation
- Acknowledgement
The FiVE-Bench dataset offers a rich, structured benchmark for fine-grained video editing. The dataset includes 420 high-quality source-target prompt pairs spanning six fine-grained video editing tasks:
- Object Replacement (Rigid)
- Object Replacement (Non-Rigid)
- Color Alteration
- Material Modification
- Object Addition
- Object Removal
-
Download the dataset from Hugging Face: π FiVE-Bench on Hugging Face
-
Follow the instructions in Installation Guide to download the dataset and install the evaluation code (
FiVE_Bench). -
Place the downloaded dataset in the directory:
./FiVE_Bench/data. The data structure should looks like:π /path/to/code/FiVE_Bench/data βββ π assets/ βββ π edit_prompt/ β βββ π edit1_FiVE.json β βββ π edit2_FiVE.json β βββ π edit3_FiVE.json β βββ π edit4_FiVE.json β βββ π edit5_FiVE.json β βββ π edit6_FiVE.json βββ π README.md βββ π¦ bmasks.zip βββ π bmasks β βββ π 0001_bus β βββ πΌοΈ 00001.jpg β βββ πΌοΈ 00002.jpg β βββ πΌοΈ ... β βββ π ... βββ π¦ images.zip βββ π images β βββ π 0001_bus β βββ πΌοΈ 00001.jpg β βββ πΌοΈ 00002.jpg β βββ πΌοΈ ... β βββ π ... βββ π¦ videos.zip βββ π videos β βββ ποΈ 0001_bus.mp4 β βββ ποΈ 0002_girl-dog.mp4 β βββ ποΈ ...
Use your video editing method to edit the FiVE-Bench videos based on the provided text prompts and generate the corresponding edited results.
Example implementations of our proposed rectified flow (RF)-based video editing methods are provided provided in the models/ directory:
- **[Pyramid-Edit](models/README.md#pyramid-edit)**: Diffusion-based video editing using Pyramid-Flow architecture
- **[Wan-Edit](models/README.md#wan-edit)**: Rectified flow-based video editing with Wan2.1-T2V-1.3B model
Run Pyramid-Edit:
# Setup model
cd models/pyramid-edit && mkdir -p hf/pyramid-flow-miniflux
# Download model checkpoint to hf/ directory
bash scripts/run_FiVE.shRun Wan-Edit:
# Setup model
cd models/wan-edit && mkdir -p hf/Wan2.1-T2V-1.3B
# Download model checkpoint to hf/ directory
bash scripts/run_FiVE.shFor detailed setup instructions and configuration options, see the Models Documentation.
Follow the installation guide in Installation Guide to get the evaluation results.
sh scripts/eval_FiVE.shEvaluation Support Elements:
-
Editing Masks: Generated using SAM2 to assist in localized metric evaluation.
-
Editing Instructions: Structured directives for each source-target pair to guide model behavior.
FiVE-Bench provides comprehensive evaluation through two major components:
These metrics quantitatively measure various dimensions of video editing quality:
- Structure Preservation
- Background Preservation
(PSNR, LPIPS, MSE, SSIM outside the editing mask) - Edit PromptβImage Consistency
(CLIP similarity on full and masked images) - Image Quality Assessment
(NIQE) - Temporal Consistency
(MFS: Motion Fidelity Score): - Runtime Efficiency
We use a vision-language model (VLM) to automatically assess whether the intended edits are reflected in the video outputs by asking it questions about the content. If the source video contains a swan, and the target prompt requests a flamingo. For the edited video, we ask
-
Yes/No Questions:
- Is there a swan in the video?
- Is there a flamingo in the video?
β The edit is considered successful only if the answers are "No" to the first question and "Yes" to the second.
-
Multiple-choice Questions:
- What is in the video? a) A swan b) A flamingo
β The edit is considered successful if the model selects the correct target object (e.g., b) A flamingo) and avoids selecting the original source object.
FiVE-Acc evaluates editing success using a vision-language model (VLM) by asking content-related questions:
- YN-Acc: Yes/No question accuracy
- MC-Acc: Multiple-choice question accuracy
- U-Acc: Union accuracy β success if any question is correct
- β©-Acc: Intersection accuracy β success only if all questions are correct
- FiVE-Acc β: Final score = average of all above metrics (higher is better)
If you use FiVE-Bench in your research, please cite us:
@article{li2025five,
title={Five: A fine-grained video editing benchmark for evaluating emerging diffusion and rectified flow models},
author={Li, Minghan and Xie, Chenxi and Wu, Yichen and Zhang, Lei and Wang, Mengyu},
journal={arXiv preprint arXiv:2503.13684},
year={2025}
}Recommended our recent papers on image/video editing:
@article{xie2025dnaedit,
title={DNAEdit: Direct Noise Alignment for Text-Guided Rectified Flow Editing},
author={Xie, Chenxi and Li, Minghan and Li, Shuai and Wu, Yuhui and Yi, Qiaosi and Zhang, Lei},
journal={arXiv preprint arXiv:2506.01430},
year={2025} # NeurIPS 2025
}@article{beaudouin2025delta,
title={Delta Velocity Rectified Flow for Text-to-Image Editing},
author={Beaudouin, Gaspard and Li, Minghan and Kim, Jaeyeon and Yoon, Sung-Hoon and Wang, Mengyu},
journal={arXiv preprint arXiv:2509.05342},
year={2025}
}Part of the code is adapted from PIE-Bench, FlowEdit (ICCV25 Best Student Paper), Pyramid-Flow and Wan model.
We thank the authors for their excellent work and for making their code publicly available.





