Using Pytorch to Extract 2D CNN Features of Video Frames

This repository is modified from video-cnn-feature.

Updates

April 17, 2021: load online model from gihub repo if the trained model is not prepared.

April 16, 2021: unified to RGB mode: convert to RGB mode if the imput image is not.

Supported Models and Options

Supported Models:

ResNeXt_WSL

Features:

oversample: tencrop the input image
unified to RGB mode: convert to "RGB" mode if the mode of input image is other modes

To-do list

Resnest (coming soon)

Environments

Ubuntu 16.04
CUDA 10.1
python 3.8
torch 1.7.1+cu101
torchvision 0.8.2+cu101
Pillow 8.1.2
numpy 1.20.2

The repository has been tested in the above environment, you don't have to use the same environment, BUT "python=3.6 pytorch >=1.7" is recommended.

This is an example to create a virtual environment using anaconda.

conda create -n cnn-feat-pytorch python=3.8
conda activate cnn-feat-pytorch
pip install -r requirements 
conda deactivate

Get started

Our code assumes the following data organization. We provide the toydata folder as an example.

collection_name
├─VideoData
├─ImageData
└─id.imagepath.txt

The toydata folder is assumed to be placed at $HOME/VisualSearch/. Video files are stored in the VideoData folder. Frame files are in the ImageDatafolder.

Video filenames shall end with .mp4, .avi, .webm, or .gif.
Frame filenames shall end with .jpg.

Feature extraction for a given video collection is performed in the following four steps. Skip the first step if frames are already there.

Step 1. Extract frames from videos

Convert the videos (3d) to frames (2d), so that we can employ the 2D CNN models mentioned before. If you are dealing an image dataset, such as "mscoco", just skip the first step. (Make sure the images are placed in the "ImageData" sub-folder) (Default get one frame every half second)

collection=toydata
bash do_extract_frames.sh $collection

If you have trouble in extracting frames from *.gif files, use "convert" command in linux as substitute.

Step 2. Extract frame-level CNN features

Extract the CNN feature of each frame. Results will be placed in the "FeatureData" sub-folder.

bash do_wsl-resnext.sh $collection

Step 3. Obtain video-level CNN features (by mean pooling over frames)

feature_name=resnext101_32x48d_wsl,avgpool,os
bash do_feature_pooling.sh $collection $feature_name

Step 4. Feature concatenation

If you have more than one features of a collection, this script can combine them into one feature file.

featname=$feature_name1+$feature_name2
bash do_concat_features.sh $collection $featname

Step 5.(Optional). Collections combination

If you have more than one collection which has the same feature, this script can combine them.

collections=${collection1}-${collection2}
feature_dim=2048
feature_name=resnext101_32x48d_wsl,avgpool,os
bash do_combine_feature.sh $feature_name $feature_dim $collections

Acknow****ledgements

Framework: https://github.com/xuchaoxi/video-cnn-feat

WSL model tutorial: https://pytorch.org/hub/facebookresearch_WSL-Images_resnext

WSL pretrained model: https://github.com/facebookresearch/WSL-Images

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
VisualSearch		VisualSearch
utils		utils
videocnn		videocnn
.gitignore		.gitignore
COCO_train2014_000000042196.jpg		COCO_train2014_000000042196.jpg
README.md		README.md
combine_features.py		combine_features.py
concat_features.py		concat_features.py
constant.py		constant.py
data_provider.py		data_provider.py
do_combine_feature.sh		do_combine_feature.sh
do_concat_features.sh		do_concat_features.sh
do_deep_feat.sh		do_deep_feat.sh
do_extract_frames.sh		do_extract_frames.sh
do_feature_pooling.sh		do_feature_pooling.sh
do_wsl-resnext.sh		do_wsl-resnext.sh
dog.jpg		dog.jpg
extract_deep_feat.py		extract_deep_feat.py
generate_imagepath.py		generate_imagepath.py
load_model.py		load_model.py
mdel_test.ipynb		mdel_test.ipynb
requirements.txt		requirements.txt
txt2bin.py		txt2bin.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using Pytorch to Extract 2D CNN Features of Video Frames

Updates

Supported Models and Options

Supported Models:

Features:

To-do list

Environments

Get started

Step 1. Extract frames from videos

Step 2. Extract frame-level CNN features

Step 3. Obtain video-level CNN features (by mean pooling over frames)

Step 4. Feature concatenation

Step 5.(Optional). Collections combination

Acknow****ledgements

About

Releases

Packages

Languages

FangmingZhou/video-cnn-feat-pytorch

Folders and files

Latest commit

History

Repository files navigation

Using Pytorch to Extract 2D CNN Features of Video Frames

Updates

Supported Models and Options

Supported Models:

Features:

To-do list

Environments

Get started

Step 1. Extract frames from videos

Step 2. Extract frame-level CNN features

Step 3. Obtain video-level CNN features (by mean pooling over frames)

Step 4. Feature concatenation

Step 5.(Optional). Collections combination

Acknow****ledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages