First, install PyTorch 1.7.1+, torchvision 0.8.2+ and other required packages as follows:
conda install -c pytorch pytorch torchvision
pip install timm==0.3.2
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
pip install mmcv==1.3.14
pip install decord
pip install git+https://github.com/ildoonet/pytorch-randaugment
Download the Kinetics videos from here.
Then download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:
./data/kinetics400/
videos_train/
vid1.mp4
...
videos_val/
vid2.mp4
...
wiki/
desc_0.txt
...
k400_LT_train_videos.txt
k400_LT_val_videos.txt
kinetics_video_train_list.txt
kinetics_video_val_list.txt
labels.txt
We used the split from CMN for Kinetics-Fewshot.
Download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:
./data/kinetics100_base
wiki/
desc_0.txt
...
k100_base_train_list.txt
labels.txt
./data/kinetics100_test
wiki/
desc_0.txt
...
k100_support_query_list.txt
labels.txt
we used the split from Efficient-Prompt for Kinetics-Fewshot-C-way.
Download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:
./data/kinetics400_fewshot_C
wiki/
desc_0.txt
...
k400_fewshot_c_train_split_0.txt
k400_fewshot_c_train_split_1.txt
...
k400_fewshot_c_train_split_9.txt
kinetics_video_val_list.txt
labels.txt
Download the split from here for Kinetics-Openset.
Then download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:
./data/kinetics400_openset
wiki/
desc_0.txt
...
k400_openset_train_list.txt
k400_openset_val_list.txt
labels.txt
To evaluate VLG, you can run:
- Pre-training stage:
bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval --eval-pretrain
- Fine-tuning stage:
bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval
For fewshot cases, you can run:
bash dist_train_arun_fewshot.sh ${CONFIG_PATH} 8
For openset cases, you can run:
bash dist_train_arun_openset.sh ${CONFIG_PATH} 8 --test --dist-eval --eval
The ${CONFIG_PATH}
is the relative path of the corresponding configuration file in the config
directory.
To train VLG on a single node with 8 GPUs for:
- Pre-training stage, run:
bash dist_train_arun.sh ${CONFIG_PATH} 8
-
Fine-tuning stage:
-
First, select the salient sentences by running this:
bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval --select
-
Then, running this:
bash dist_train_arun.sh ${CONFIG_PATH} 8
-
The ${CONFIG_PATH}
is the relative path of the corresponding configuration file in the config
directory.
The checkpoints are provided in Baidu Netdisk, and the corresponding code is nc6e.
If you are interested in our work, please cite as follows:
@article{lin2022vlg,
title={VLG: General Video Recognition with Web Textual Knowledge},
author={Lin, Jintao and Liu, Zhaoyang and Wang, Wenhai and Wu, Wayne and Wang, Limin},
journal={arXiv preprint arXiv:2212.01638},
year={2022}
}
This repo contains modified codes from: VL-LTR, ActionCLIP, and OpenMax.