Skip to content

Code for Motion-aware Contrastive Video Representation Learning via Foreground-background Merging (CVPR 2022)

License

Notifications You must be signed in to change notification settings

Mark12Ding/FAME

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Motion-aware Contrastive Video Representation Learning via Foreground-background Merging

Official pytorch implementation of our CVPR 2022 paper Motion-aware Contrastive Video Representation Learning via Foreground-background Merging.

Overview

Contrastive learning in the video domain exists severe background bias. When naively pulling two augmented views of a video closer, the model however tends to learn the common static background as a shortcut but fails to capture the motion information, a phenomenon dubbed as background bias. We introduce Foreground-background Merging (FAME), a novel augmentation technique to deliberately compose the moving foreground region of the selected video onto the static background of others. Specifically, without any off-the-shelf detector, we extract the moving foreground out of background regions via the frame difference and color statistics, and shuffle the background regions among the videos. By leveraging the semantic consistency between the original clips and the fused ones, the model focuses more on the motion patterns and is debiased from the background shortcut.

teaser

[Project Page] [arXiv] [PDF]

Usage

Requirements

  • pytroch >= 1.8.1
  • tensorboard
  • cv2
  • kornia

Data preparation

Pretrain

In default, we train backbone I3D on K400 on a single node with 8 NVIDIA V100 gpus for 200 epochs.

python3 train.py \
  --log_dir $your/log/path \
  --ckp_dir $your/checkpoint/path \
  -a I3D \
  --dataset k400 \
  --lr 0.01  \
  -fpc 16 \
  -cs 224 \
  -b 64 \
  -j 128 \
  --cos \
  --epochs 200 \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  $kinetics400/dataset/path

Pretrained Model

I3D after pretrain on K400 [google drive]

I3D after finetune on UCF101 (Acc@1 88.9) [google drive]

Action Recognition Downstream Evaluation

In default, we finetune backbone I3D on UCF101 on a single node with 4 NVIDIA V100 gpus for 150 epochs.

python3 eval.py \
  --log_dir $your/log/path \
  --pretrained $your/checkpoint/path \
  -a I3D \
  --seed 42 \
  --num_class 101 \
  --wd 1e-4 \
  --lr 0.025 \
  --weight_decay 0.0001 \
  --lr_decay 0.1 \
  -fpc 16 \
  -b 128 \
  -j 64 \
  -cs 224 \
  --finetune \
  --epochs 150 \
  --schedule 60 120 \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  $ucf101/dataset/path

Visualization

We visualize class-agnostic activation maps. FAME can well caputre the foreground motion while the baseline method fails. vis

Acknowledgement

Our code is based on the implementation of VideoMoCo and MoCo. We sincerely thanks those authors for their great works.

Citation

If our code is helpful to your work, please consider citing:

@inproceedings{ding2022motion,
  title={Motion-Aware Contrastive Video Representation Learning via Foreground-Background Merging},
  author={Ding, Shuangrui and Li, Maomao and Yang, Tianyu and Qian, Rui and Xu, Haohang and Chen, Qingyi and Wang, Jue and Xiong, Hongkai},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={9716--9726},
  year={2022}
}

About

Code for Motion-aware Contrastive Video Representation Learning via Foreground-background Merging (CVPR 2022)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published