Skip to content

Multi-Modal Gesture Recognition Data Leakage: Video.Labels in test .mat files allows direct ground-truth extraction #77

@ramgorthi04

Description

@ramgorthi04

While reviewing the logs of our agent's work on the benchmark, I noticed a perfect score on multi-modal gesture recognition. The agent discovered a ground truth label data leakage in the test .mat files.

Specifically, it discovered that the test .mat files contain Video.Labels structs with gesture annotations (fields Name, Begin, and End). This effectively exposes the ground-truth sequences for test samples and enables participants to reconstruct perfect submissions without training any model, by simply mapping gesture names to numeric IDs using the training data.

Reproduction Steps

  1. Open any test sample archive, e.g.:

    tar -xzf test.tar.gz Sample0300.zip
    unzip Sample0300.zip
    
  2. Load Sample0300_data.mat in MATLAB or Python (scipy.io.loadmat).

  3. Inspect:

    md['Video'].Labels[0].Name, md['Video'].Labels[0].Begin, md['Video'].Labels[0].End

    You’ll find gesture names and frame ranges.

  4. Sorting these labels by Begin reconstructs the full ground-truth sequence for each test sample.

You can view the full logs of our agent's work as it discovered and exploited the data leakage here: https://github.com/ramgorthi04/OperandLinear-MLE-Bench/blob/main/MLE_Submission/multi-modal-gesture-recognition/multi-modal-gesture-recognition-20250924-024033/full_history.json

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions