This repository is modified from video-cnn-feature.
April 17, 2021: load online model from gihub repo if the trained model is not prepared.
April 16, 2021: unified to RGB mode: convert to RGB mode if the imput image is not.
- oversample: tencrop the input image
- unified to RGB mode: convert to "RGB" mode if the mode of input image is other modes
- Resnest (coming soon)
- Ubuntu 16.04
- CUDA 10.1
- python 3.8
- torch 1.7.1+cu101
- torchvision 0.8.2+cu101
- Pillow 8.1.2
- numpy 1.20.2
The repository has been tested in the above environment, you don't have to use the same environment, BUT "python=3.6 pytorch >=1.7" is recommended.
This is an example to create a virtual environment using anaconda.
conda create -n cnn-feat-pytorch python=3.8
conda activate cnn-feat-pytorch
pip install -r requirements
conda deactivate
Our code assumes the following data organization. We provide the toydata
folder as an example.
collection_name
├─VideoData
├─ImageData
└─id.imagepath.txt
The toydata
folder is assumed to be placed at $HOME/VisualSearch/
. Video files are stored in the VideoData
folder. Frame files are in the ImageData
folder.
- Video filenames shall end with
.mp4
,.avi
,.webm
, or.gif
. - Frame filenames shall end with
.jpg
.
Feature extraction for a given video collection is performed in the following four steps. Skip the first step if frames are already there.
Convert the videos (3d) to frames (2d), so that we can employ the 2D CNN models mentioned before. If you are dealing an image dataset, such as "mscoco", just skip the first step. (Make sure the images are placed in the "ImageData" sub-folder) (Default get one frame every half second)
collection=toydata
bash do_extract_frames.sh $collection
If you have trouble in extracting frames from *.gif files, use "convert" command in linux as substitute.
Extract the CNN feature of each frame. Results will be placed in the "FeatureData" sub-folder.
bash do_wsl-resnext.sh $collection
feature_name=resnext101_32x48d_wsl,avgpool,os
bash do_feature_pooling.sh $collection $feature_name
If you have more than one features of a collection, this script can combine them into one feature file.
featname=$feature_name1+$feature_name2
bash do_concat_features.sh $collection $featname
If you have more than one collection which has the same feature, this script can combine them.
collections=${collection1}-${collection2}
feature_dim=2048
feature_name=resnext101_32x48d_wsl,avgpool,os
bash do_combine_feature.sh $feature_name $feature_dim $collections
Framework: https://github.com/xuchaoxi/video-cnn-feat
WSL model tutorial: https://pytorch.org/hub/facebookresearch_WSL-Images_resnext
WSL pretrained model: https://github.com/facebookresearch/WSL-Images