Video Bypass Aggregator (VBA)

Undergraduate Thesis Download

Title: Understanding Few-shot Video with Pretrained Image-Text Models

Advisor: Jiwen Lu, Jie Zhou, Yansong Tang

Abstract

Most existing few-shot video classification methods do not consider the transfer of pretrained knowledge in their design. The accuracy of these methods is sub-optimal when directly introducing pretrained knowledge, and the transfer efficiency is relatively poor. On the other hand, existing video classification methods for full-sample supervised learning always encode each sample independently, making it difficult to jointly encode all samples in the support and query sets in few-shot tasks.

To effectively transfer the scene modeling ability of pretrained models to the video domain and achieve fine accuracy in few-shot video classification tasks, this paper proposes the Video Bypass Aggregator (VBA) structure. Through cross-layer information aggregation, the method effectively transfers the multi-granularity scene modeling ability of pretrained features. Through cross-frame information aggregation, the method fully learns the temporal modeling ability for videos. Through cross-video information aggregation, the method uses all sample information in the support and query sets to obtain more robust feature representations for each sample.

The proposed method achieves higher or comparable accuracy compared to existing methods with the same settings on 6 mainstream video datasets and 7 few-shot splits, demonstrating the overall effectiveness of the method. The ablation experiments demonstrate the role of each module. The paper also demonstrates the training efficiency of the method, shows its cross-domain understanding ability, and provides qualitative visualizations of attention maps.

Results

Dataset	Backbone	Config	Top-1 Acc (Previous SOTA)	Top-5 Acc (Previous SOTA)	Download CKPT
Kinetics	CLIP ViT-B/32	config	86.2 (82.0)	95.1 (91.4)	wget --no-check-certificate https://cloud.tsinghua.edu.cn/f/17c1e58c34144481967a/?dl=1
minissv2_small	CLIP ViT-B/32	config	52.1 (48.3)	71.2 (64.1)	wget --no-check-certificate https://cloud.tsinghua.edu.cn/f/2a560b10b1b54976afa6/?dl=1
minissv2_full	CLIP ViT-B/32	config	61.5 (59.0)	74.7 (74.9)	wget --no-check-certificate https://cloud.tsinghua.edu.cn/f/25948c7b28a34ceab949/?dl=1
HMDB51	CLIP ViT-B/32	config	73.4 (71.1)	87.1 (83.9)	wget --no-check-certificate https://cloud.tsinghua.edu.cn/f/e76620555fb242b794a9/?dl=1
UCF101	CLIP ViT-B/32	config	91.9 (92.6)	97.5 (96.8)	wget --no-check-certificate https://cloud.tsinghua.edu.cn/f/ae95b2f886934b288292/?dl=1
Diving48	CLIP ViT-B/32	config	75.7 (74.4)	90.1 (85.6)	wget --no-check-certificate https://cloud.tsinghua.edu.cn/f/f6787f69207d4ace9c87/?dl=1
Gym99	CLIP ViT-B/32	config	91.3 (90.0)	95.4 (93.6)	wget --no-check-certificate https://cloud.tsinghua.edu.cn/f/5bf3a6e177d840b4abdb/?dl=1

Previous SOTA here refers to HyRSM (CVPR 2022), with its backbone replaced by CLIP ViT-B/32.

Prepare Environment

# prepare conda environment
conda create -n vba python=3.8
pip install torch==1.10.0+cu113 torchvision==0.11.0+cu113 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.5.0 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.10.0/index.html
pip install decord scipy einops tensorboard imgaug setuptools==59.5.0 yapf==0.40.1 numpy==1.23.0
pip install -e .

# prepare pretrained weights
wget https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt
mv ViT-B-32.pt root_dir/data/clip_weight/

Prepare Data

This repo only provides few-shot data for Mini-Kinetics, Mini-SSV2, HMDB51, UCF101, and Diving48. For Gym99, please refer to its official repo.

# Mini-Kinetics, about 6G
cd root_dir/data/minikinetics
wget --no-check-certificate https://cloud.tsinghua.edu.cn/f/8b68f113b19a48338072/?dl=1
mv index.html\?dl\=1 minikinetics_videos.zip
unzip -q minikinetics_videos.zip

# Mini-SSV2, about 8G
cd root_dir/data/minissv2
wget --no-check-certificate https://cloud.tsinghua.edu.cn/f/bde18b48b9054143a1f4/?dl=1
mv index.html\?dl\=1 minissv2_videos.zip
unzip -q minissv2_videos.zip

# HMDB51, about 2G
cd root_dir/data/hmdb51
wget --no-check-certificate https://cloud.tsinghua.edu.cn/f/dd6fe5c35eb44b3fa2ca/?dl=1
mv index.html\?dl\=1 hmdb51_videos.zip
unzip -q hmdb51_videos.zip

# UCF101, about 7G
cd root_dir/data/ucf101
wget --no-check-certificate https://cloud.tsinghua.edu.cn/f/075698d33fcb4c6e84e9/?dl=1
mv index.html\?dl\=1 ucf101_videos.zip
unzip -q ucf101_videos.zip

# Diving48, about 5G
cd root_dir/data/diving48
wget --no-check-certificate https://cloud.tsinghua.edu.cn/f/d6e13e50c9b64bfca9b0/?dl=1
mv index.html\?dl\=1 diving48_videos.zip
unzip -q diving48_videos.zip

Train and Test

python tools/train.py <config_path> --validate  # train
python tools/test.py <config_path> <ckpt_path>  # test

To get the exact results in the paper, please use the provided data & ckpt, and refer to tools/train_and_test0.sh and tools/train_and_test1.sh.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
configs		configs
data		data
mmaction		mmaction
resources		resources
tests		tests
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh-CN.md		README_zh-CN.md
Understanding_Few-shot_Video_with_Pretrained_Image-Text_Models.pdf		Understanding_Few-shot_Video_with_Pretrained_Image-Text_Models.pdf
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Bypass Aggregator (VBA)

Title: Understanding Few-shot Video with Pretrained Image-Text Models

Advisor: Jiwen Lu, Jie Zhou, Yansong Tang

Abstract

Results

Prepare Environment

Prepare Data

Train and Test

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

InvincibleWyq/VBA

Folders and files

Latest commit

History

Repository files navigation

Video Bypass Aggregator (VBA)

Title: Understanding Few-shot Video with Pretrained Image-Text Models

Advisor: Jiwen Lu, Jie Zhou, Yansong Tang

Abstract

Results

Prepare Environment

Prepare Data

Train and Test

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages