Skip to content

FangmingZhou/video-cnn-feat-pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Using Pytorch to Extract 2D CNN Features of Video Frames

This repository is modified from video-cnn-feature.

Updates

April 17, 2021: load online model from gihub repo if the trained model is not prepared.

April 16, 2021: unified to RGB mode: convert to RGB mode if the imput image is not.

Supported Models and Options

Supported Models:

Features:

  • oversample: tencrop the input image
  • unified to RGB mode: convert to "RGB" mode if the mode of input image is other modes

To-do list

Environments

  • Ubuntu 16.04
  • CUDA 10.1
  • python 3.8
  • torch 1.7.1+cu101
  • torchvision 0.8.2+cu101
  • Pillow 8.1.2
  • numpy 1.20.2

The repository has been tested in the above environment, you don't have to use the same environment, BUT "python=3.6 pytorch >=1.7" is recommended.

This is an example to create a virtual environment using anaconda.

conda create -n cnn-feat-pytorch python=3.8
conda activate cnn-feat-pytorch
pip install -r requirements 
conda deactivate

Get started

Our code assumes the following data organization. We provide the toydata folder as an example.

collection_name
├─VideoData
├─ImageData
└─id.imagepath.txt

The toydata folder is assumed to be placed at $HOME/VisualSearch/. Video files are stored in the VideoData folder. Frame files are in the ImageDatafolder.

  • Video filenames shall end with .mp4, .avi, .webm, or .gif.
  • Frame filenames shall end with .jpg.

Feature extraction for a given video collection is performed in the following four steps. Skip the first step if frames are already there.

Step 1. Extract frames from videos

Convert the videos (3d) to frames (2d), so that we can employ the 2D CNN models mentioned before. If you are dealing an image dataset, such as "mscoco", just skip the first step. (Make sure the images are placed in the "ImageData" sub-folder) (Default get one frame every half second)

collection=toydata
bash do_extract_frames.sh $collection

If you have trouble in extracting frames from *.gif files, use "convert" command in linux as substitute.

Step 2. Extract frame-level CNN features

Extract the CNN feature of each frame. Results will be placed in the "FeatureData" sub-folder.

bash do_wsl-resnext.sh $collection

Step 3. Obtain video-level CNN features (by mean pooling over frames)

feature_name=resnext101_32x48d_wsl,avgpool,os
bash do_feature_pooling.sh $collection $feature_name

Step 4. Feature concatenation

If you have more than one features of a collection, this script can combine them into one feature file.

featname=$feature_name1+$feature_name2
bash do_concat_features.sh $collection $featname

Step 5.(Optional). Collections combination

If you have more than one collection which has the same feature, this script can combine them.

collections=${collection1}-${collection2}
feature_dim=2048
feature_name=resnext101_32x48d_wsl,avgpool,os
bash do_combine_feature.sh $feature_name $feature_dim $collections 

Acknow****ledgements

Framework: https://github.com/xuchaoxi/video-cnn-feat

WSL model tutorial: https://pytorch.org/hub/facebookresearch_WSL-Images_resnext

WSL pretrained model: https://github.com/facebookresearch/WSL-Images

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published