Skip to content

cplou99/BayesianVSLNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

36 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

BayesianVSLNet - Temporal Video Segmentation with Natural Language using Text-Video Cross Attention and Bayesian Order-priors

Description

πŸ”” News:

  • πŸ”œ: Paper with an improved BayesianVSLNet++ version together with checkpoints and pre-extracted video features.
  • πŸ”₯ 7/15/2024: Code released!
  • 😎 6/15/2024: Poster presentation at EgoVis Workshop during CVPR2024.
  • πŸ₯³ 6/10/2024: Challenge report is available on ArXiv!
  • πŸ† 6/01/2024: BayesianVSLNet wins the Ego4D Step Grounding Challenge CVPR24.

BayesianVSLNet

We build our approach BayesianVSLNet: Bayesian temporal-order priors for test time refinement. Our model significantly improves upon traditional models by incorporating a novel Bayesian temporal-order prior during inference, which adjusts for cyclic and repetitive actions within video, enhancing the accuracy of moment predictions.

Alt text

Quick start

Install dependencies

git clone https://github.com/cplou99/BayesianVSLNet
cd BayesianVSLNet
pip install -r requirements.txt

Video Features

We use both Omnivore-L, EgoVideo and EgoVLPv2 video features. They should be pre-extracted and located at ./ego4d-goalstep/step-grounding/data/features/.

Model

It is necessary to locate the EgoVLPv2 weights to extract text features ./NaQ/VSLNet_Bayesian/model/EgoVLP_weights.

Train

cd ego4d-goalstep/step_grounding/
bash train_Bayesian.sh experiments/

Inference

cd ego4d-goalstep/step_grounding/
bash infer_Bayesian.sh experiments/

Results

Ego4D Step Grounding Challenge

The challenge is built over Ego4d-GoalStep dataset and code.

Goal: Given an untrimmed egocentric video, identify the temporal action segment corresponding to a natural language description of the step. Specifically, predict the (start_time, end_time) for a given keystep description.

Challenge

You will find in the leaderboard πŸš€ the results in the test set for the best approaches. Our method is currently in the first place πŸš€πŸ”₯.

Case study: Robotics

We present qualitative results in a real-world assistive robotics scenario to demonstrate the potential of our approach in enhancing human-robot interaction in practical applications.

Challenge

πŸ“ Citation

@misc{plou2024carlorego4dstep,
      title={CARLOR @ Ego4D Step Grounding Challenge: Bayesian temporal-order priors for test time refinement}, 
      author={Carlos Plou and Lorenzo Mur-Labadia and Ruben Martinez-Cantin and Ana C. Murillo},
      year={2024},
      eprint={2406.09575},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2406.09575}, 
}

About

BayesianVSLNet code repository. Winner πŸ† of the Ego4D Step Grounding Challenge at CVPR24.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published