BAM-DETR

Official Pytorch Implementation of 'BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos'

BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
Pilhyeon Lee†, Hyeran Byun
(†: Corresponding author)

Paper: https://arxiv.org/abs/2312.00083

Abstract: Temporal sentence grounding aims to localize moments relevant to a language description. Recently, DETR-like approaches have shown notable progress by decoding the center and length of a target moment from learnable queries. However, they suffer from the issue of center misalignment raised by the inherent ambiguity of moment centers, leading to inaccurate predictions. To remedy this problem, we introduce a novel boundary-oriented moment formulation. In our paradigm, the model no longer needs to find the precise center but instead suffices to predict any anchor point within the interval, from which the onset and offset are directly estimated. Based on this idea, we design a Boundary-Aligned Moment Detection Transformer (BAM-DETR), equipped with a dual-pathway decoding process. Specifically, it refines the anchor and boundaries within parallel pathways using global and boundary-focused attention, respectively. This separate design allows the model to focus on desirable regions, enabling precise refinement of moment predictions. Further, we propose a quality-based ranking method, ensuring that proposals with high localization qualities are prioritized over incomplete ones. Extensive experiments verify the advantages of our methods, where our model records new state-of-the-art results on three benchmarks.

Prerequisites

Recommended Environment

Python 3.7
Pytorch 1.9
Tensorboard 1.15

Dependencies

You can set up the environments by using $ pip3 install -r requirements.txt. For anaconda setup, please refer to the official Moment-DETR github.

Data Preparation

1. Prepare datasets

Prepare QVHighlights dataset.
Extract features with Slowfast and CLIP models.
- We recommend downloading moment_detr_features.tar.gz (8GB)
Extract it under project root directory.

tar -xf path/to/moment_detr_features.tar.gz

Training

Training can be launched by running the command below. If you want to try other training options, please refer to the config file bam_detr/config.py.

bash bam_detr/scripts/train.sh

Evaluation

Once the model is trained, you can use the following command for inference.

bash bam_detr/scripts/inference.sh CHECKPOINT_PATH SPLIT_NAME

where CHECKPOINT_PATH is the path to the saved checkpoint, SPLIT_NAME is the split name for inference, can be one of val and test.

References

We note that this repo is heavily based on the following codebases. We express our appreciation to the authors for sharing their code.

Citation

If you find this code useful, please cite our paper.

@inproceedings{lee2024bam-detr,
  title={Bam-detr: Boundary-aligned moment detection transformer for temporal sentence grounding in videos},
  author={Lee, Pilhyeon and Byun, Hyeran},
  booktitle={European Conference on Computer Vision},
  pages={220--238},
  year={2024},
  organization={Springer}
}

Contact

If you have any question or comment, please contact the first author of the paper - Pilhyeon Lee (pilhyeon.lee@inha.ac.kr).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
bam_detr		bam_detr
data/hl		data/hl
standalone_eval		standalone_eval
utils		utils
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BAM-DETR

Official Pytorch Implementation of 'BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos'

Prerequisites

Recommended Environment

Dependencies

Data Preparation

Training

Evaluation

References

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Pilhyeon/BAM-DETR

Folders and files

Latest commit

History

Repository files navigation

BAM-DETR

Official Pytorch Implementation of 'BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos'

Prerequisites

Recommended Environment

Dependencies

Data Preparation

Training

Evaluation

References

Citation

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages