Skip to content

Support OMTG Benchmark#1427

Open
insomniaaac wants to merge 1 commit intoopen-compass:mainfrom
insomniaaac:main
Open

Support OMTG Benchmark#1427
insomniaaac wants to merge 1 commit intoopen-compass:mainfrom
insomniaaac:main

Conversation

@insomniaaac
Copy link

Description

This PR adds support for the One-to-Many Temporal Grounding (OMTG) benchmark, as proposed in the paper Towards One-to-Many Temporal Grounding.

Unlike traditional temporal grounding tasks that assume a one-to-one mapping, OMTG requires the model to localize all disjoint video segments corresponding to a query.

Key Changes

  • New Benchmark Support: Added OMTGBench dataset class.
  • New Metrics: Implemented rigorous metrics for multi-instance retrieval:
    • C-Acc (Count Accuracy): Evaluates event cardinality perception.
    • EtF1 (Effective Temporal F1): The primary metric that penalizes incomplete retrieval.
    • tF1 (Temporal F1-Score).
  • Evaluation Pipeline: Integrated the OMTG evaluation logic into the existing framework.

How to Use

Users can evaluate models on the OMTG benchmark using the following command:

python run.py --data OMTGBench --model Qwen3-VL-4B-Instruct --verbose

Copy link
Collaborator

@FangXinyu-0913 FangXinyu-0913 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Consider add quick config in vlmeval/dataset/video_dataset_config.py for better usage
  2. Please report OMTG Bench performance for representative models (using VLMEvalKit, the official repo, and paper results). Include environment details (transformers, torch, vllm/sglang, flash-attention, python) and specific configs (like nframe) used for these runs.
  3. Please help fix the lint: https://github.com/open-compass/VLMEvalKit/actions/runs/21662694744/job/62492361066?pr=1427

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants