While reviewing the logs of our agent's work on the benchmark, I noticed a perfect score on multi-modal gesture recognition. The agent discovered a ground truth label data leakage in the test .mat files.
Specifically, it discovered that the test .mat files contain Video.Labels structs with gesture annotations (fields Name, Begin, and End). This effectively exposes the ground-truth sequences for test samples and enables participants to reconstruct perfect submissions without training any model, by simply mapping gesture names to numeric IDs using the training data.
Reproduction Steps
-
Open any test sample archive, e.g.:
tar -xzf test.tar.gz Sample0300.zip
unzip Sample0300.zip
-
Load Sample0300_data.mat in MATLAB or Python (scipy.io.loadmat).
-
Inspect:
md['Video'].Labels[0].Name, md['Video'].Labels[0].Begin, md['Video'].Labels[0].End
You’ll find gesture names and frame ranges.
-
Sorting these labels by Begin reconstructs the full ground-truth sequence for each test sample.
You can view the full logs of our agent's work as it discovered and exploited the data leakage here: https://github.com/ramgorthi04/OperandLinear-MLE-Bench/blob/main/MLE_Submission/multi-modal-gesture-recognition/multi-modal-gesture-recognition-20250924-024033/full_history.json