Skip to content

xiaomi-research/prove

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PROVE (Perceptual RemOVal cohErence Benchmark)

Official PyTorch code for PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media

version mit

If PROVE is helpful to your projects, please help star this repo. Thanks!

Overview

overall_structure

PROVE is a unified evaluation framework for object removal in images and videos, addressing the critical gap between existing metrics and human perception. It consists of:

  • RC-S (Removal Coherence - Spatial): Measures how well the inpainted region blends with surrounding background within a single frame via sliding-window MMD on DINOv2 patch features.
  • RC-T (Removal Coherence - Temporal): Measures temporal coherence of the inpainted region across consecutive frames via distribution tracking within shared restored regions.
  • PROVE-Bench: A two-tier real-world benchmark comprising PROVE-M (80 motion-augmented paired videos with GT) and PROVE-H (100 challenging videos without GT).

Key Findings

Limitation Existing Metrics Our Solution
Full-Reference Bias PSNR/SSIM/LPIPS reward copy-paste over genuine erasure RC-S: no GT required, local region evaluation
No-Reference Blind Spots ReMOVE/CFD favor blurry outputs RC-S: DINOv2 + MMD robust to blur bias
Temporal Insensitivity TC/TF dominated by unchanged background RC-T: localized temporal distribution matching

Benchmark Results

PROVE-M (with Ground Truth)

Method PSNR↑ SSIM↑ LPIPS↓ ReMOVE↑ CFD↓ RC-S↑ RC-T↓
FGT 21.6511 0.8619 0.2013 0.8622 0.3229 0.3797 0.8031
ProPainter 22.1846 0.8768 0.1559 0.8676 0.2774 0.4427 0.5951
DiffuEraser 22.0758 0.8706 0.1518 0.8681 0.3308 0.4787 0.4851
VACE (1.3B) 20.0826 0.8654 0.1545 0.8117 0.3283 0.4036 0.5217
Minimax-Remover (1.3B) 21.7476 0.8707 0.1542 0.8710 0.3202 0.4793 0.4485
GenOmni (CogV5B) 25.0165 0.9030 0.1223 0.8755 0.3842 0.5029 0.3145
GenOmni (Wan1.3B) 25.1480 0.9017 0.1109 0.8815 0.3457 0.5188 0.3238
ROSE (1.3B) 26.1333 0.9003 0.1212 0.8803 0.3364 0.4924 0.6538
EffectErase (1.3B) 27.0049 0.9098 0.1142 0.8841 0.3412 0.5270 0.2728
UnderEraser (14B) 28.3325 0.9156 0.0981 0.8824 0.2986 0.5188 0.3276
SVOR (1.3B) 27.4289 0.9239 0.0839 0.8836 0.2794 0.5236 0.2987

PROVE-H (without Ground Truth)

Method PSNR↑ SSIM↑ LPIPS↓ ReMOVE↑ CFD↓ RC-S↑ RC-T↓
FGT 29.4448 0.8615 0.1927 0.8474 0.3065 0.3716 0.5866
ProPainter 33.3531 0.9274 0.1063 0.8383 0.2830 0.3932 0.4453
DiffuEraser 31.4112 0.9178 0.1098 0.8440 0.3165 0.4387 0.3911
VACE (1.3B) 26.7266 0.8898 0.1071 0.8047 0.3288 0.4192 0.3438
Minimax-Remover (1.3B) 29.6021 0.8660 0.1315 0.8545 0.3320 0.4617 0.3277
GenOmni (CogV5B) 28.7643 0.8873 0.1183 0.8536 0.3516 0.5006 0.2141
GenOmni (Wan1.3B) 29.3140 0.8940 0.1027 0.8596 0.3422 0.5127 0.2368
ROSE (1.3B) 27.6261 0.8508 0.1402 0.8538 0.3361 0.4687 0.4373
EffectErase (1.3B) 24.3793 0.8156 0.1742 0.8532 0.3590 0.5081 0.2363
UnderEraser (14B) 27.4989 0.8485 0.1434 0.8560 0.3165 0.5075 0.2688
SVOR (1.3B) 27.5335 0.8907 0.1046 0.8574 0.3107 0.5166 0.2419

Note: Due to compliance requirements, the open-source data differs slightly from the data used in the paper. The results above are based on the open-source version and may exhibit minor numerical differences from the paper, but the overall trends remain consistent.

Prerequisites

  1. Python environment (Python 3.10+)
    pytorch 2.6+
    transformers 4.51+
    opencv-python
    numpy
    scikit-image
    pandas
    tqdm
  1. Pretrained Models:
  • Download DINOv2-giant and update the DINO_PATH in run_prove_metrics.py.
  1. Dataset Configuration:
  • Download the PROVE-Bench dataset from HuggingFace.
  • Update datasets in utils/dataset.py as per your dataset setup.
DATASET = {
    # Video datasets
    "PROVE-M": {
        "inputs": "/PATH/TO/RAW_VIDEOS",
        "masks":  "/PATH/TO/MASKS",
        "type":   "video"
    },
    "PROVE-H": {
        "inputs": "/PATH/TO/RAW_VIDEOS",
        "masks":  "/PATH/TO/MASKS",
        "type":   "video"
    },
    # Image dataset
    "rord": {
        "inputs": "/PATH/TO/RAW_IMAGES",
        "masks":  "/PATH/TO/MASKS",
        "type":   "image"
    }
}

Attention:

  • Generated results must share the same filenames as the originals (extensions may differ).
  • Masks are required for both metrics. White regions indicate the removed object.

File Structure

PROVE/
β”œβ”€β”€ run_prove_metrics.py       # Main evaluation script
β”œβ”€β”€ README.md
└── utils/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ dataset.py             # Dataset configuration
    β”œβ”€β”€ media_utils.py         # Video/image I/O and pairing
    β”œβ”€β”€ metrics.py             # RC-S and RC-T implementations
    β”œβ”€β”€ bbox.py                # Bounding box utilities
    └── predictors.py          # DINOv2 feature predictor

Usage

Video Evaluation (RC-S + RC-T)

python run_prove_metrics.py \
    --dataset PROVE-M \
    --result_dir /PATH/TO/GENERATED_VIDEOS \
    --metrics rc_s rc_t \
    --out_csv results.csv

Image Evaluation (RC-S only)

python run_prove_metrics.py \
    --dataset rord \
    --result_dir /PATH/TO/GENERATED_IMAGES \
    --metrics rc_s \
    --out_csv results.csv

Note: RC-T is only applicable to video datasets and will be automatically skipped for image datasets.

Arguments

Argument Description Default
--dataset Dataset name (PROVE-M, PROVE-H, rord) required
--result_dir Directory containing generated results required
--metrics Metrics to compute: rc_s, rc_t rc_s rc_t
--out_csv Output CSV filename metrics_prove.csv
--mask_dir Override default mask directory None
--max_items Limit number of items to process None
--device Compute device cuda

Output

The output CSV contains per-item scores and a summary row:

case_id rc_s rc_t time
video_001.mp4 0.1523 0.1482 12.34
video_002.mp4 0.1487 0.1501 11.87
AVERAGE 0.1505 0.1492 12.11
  • RC-S: higher is better (smaller discrepancy between inpainted region and background).
  • RC-T: lower is better (higher temporal consistency across frames).

Acknowledgement

Our work benefits from the following open-source projects:

Citation

If you find our repo useful for your research, please consider citing our paper:

@article{li2026prove,
   title={PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media},
   author={Li, Fuhao and You, Shaofeng and Hu, Jiagao and Liu, Yu and Chen, Yuxuan and Wang, Zepeng and Wang, Fei and Zhou, Daiguo and Luan, Jian},
   journal={arXiv preprint arXiv:2605.14534},
   year={2026}
}

About

PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages