QFVS dataset uses the same raw videos as the UT Egocentric (UT Ego) Dataset. We download the UT Ego raw videos and QFVS annotations. For quickstart and easy access for the users, we provide the preprocessed videos and annotations:
mkdir Datasets && cd Datasets
wget https://www.cis.jhu.edu/~shraman/EgoVLPv2/datasets/QFVS.tgz
tar -xvzf QFVS.tgz && rm QFVS.tgz
Method | Video-1 | Video-2 | Video-3 | Video-4 | Average |
---|---|---|---|---|---|
EgoVLPv2 | 53.30 | 54.13 | 62.64 | 38.25 | 52.08 |
Download EgoVLPv2 checkpoint, and update load_checkpoint
in qfvs.json
with its path. QFVS dataset contains only 4 videos. Use 3 videos for training, and the rest for evaluation. Perform 4 different training runs to evaluate on 4 different videos. Change device_ids
based on available GPUs.
mkdir multimodal_features/
python main.py --train_videos 2,3,4 --test_video 1 --cuda_base cuda:0 --device_ids 0,1,2,3
python main.py --train_videos 1,3,4 --test_video 2 --cuda_base cuda:0 --device_ids 0,1,2,3
python main.py --train_videos 1,2,4 --test_video 3 --cuda_base cuda:0 --device_ids 0,1,2,3
python main.py --train_videos 1,2,3 --test_video 4 --cuda_base cuda:0 --device_ids 0,1,2,3
Due to the tiny training set, the results on the QFVS dataset significantly vary across different runs. We have performed multiple runs and reported the best results.
QFVS implementation partially uses VASNet codebase.