This repository contains the official code for training, inference, and evaluation of MATR from the ICCV'25 paper "Aligning Moments in Time using Video Queries".
# create new env MATR
$ conda create -n MATR python=3.10.4
# activate MATR
$ conda activate MATR
# install pytorch, torchvision
$ conda install -c pytorch pytorch torchvision
# install other dependencies
$ pip install -r requirements.txt
In order to train MATR on our proposed dataset or your own dataset please prepare your dataset annotations following the format in data. The dataset directory should have the following structure:
data/
├── sportsmoment/
│ ├── metadata/
│ │ ├── train.jsonl
│ │ └── val.jsonl
│ ├── txt_clip/
│ ├── target_vid_clip/
│ └── query_vid_clip/
└── ActivityNet/
├── metadata/
│ ├── train.jsonl
│ └── val.jsonl
├── txt_clip/
├── target_vid_clip/
└── query_vid_clip/
# set the path and required parameters in the train.sh
$ bash train.sh
# set the path and required parameters in the inference.py
$ python inference.py
# set the path and required parameters in the eval.py
$ python eval.py
If you find this work useful, please consider citing it as:
@inproceedings{kumar2025matr,
title={Aligning Moments in Time using Video Queries},
author={Yogesh Kumar and Uday Agarwal and Manish Gupta and Anand Mishra},
booktitle={International Conference on Computer Vision, ICCV},
year={2025},
}Our codebase is built upon the following open-source repositories:
- https://github.com/showlab/UniVTG
- https://github.com/SamsungLabs/Drop-DTW
- https://github.com/jayleicn/moment_detr
Please feel free to open an issue or email us at yogesh.mcs17.du@gmail.com / ndc.uday@gmail.com

