This is a PyTorch implementation of the "Spatiotemporal Multiplier Networks for Video Action Recognition" paper by Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes published in CVPR 2017. The official code released by Christoph can be found here.
- Introduction
- Installation
- Pre-trained Base Networks
- Datasets
- Preparation
- Training and Testing Procedures
- Experimental Results
Action recognition is one of the core tasks in video understanding and it has similar importance to image classification in the static vision domain. There are two common approaches in deep learning that started far apart at the beginning and recently have shown converging to somewhere in between. The first approach is using 3D convolutional layers that process the input spatiotemporal tensor while the second approach is human-brain-inspired and benefits from a Siamese network architecture. There are two parallel pathway of information processing: one takes RGB frames while the other takes optical flow frames. The "Spatiotemporal Multiplier Networks for Video Action Recognition" paper is an attempt to show how cross-stream lateral connections in a ResNet network architecture could be realized.
- Clone the spatiotemporal-multiplier-networks-pytorch repository
# Clone the repository
git clone https://github.com/mbiparva/spatiotemporal-multiplier-networks-pytorch.git
- Go into the tools directory
cd tools
- Run the training or testing script
# to train
python train.py
# to test
python test.py
Please download the pre-trained base networks provided by the official repository here. The current implementatio uses ResNet-50, so make sure you choose the network snapshot that matches best your dataset (UCF-101), network architecture (ResNet-50), and the dataset split number correctly. You need to copy the downloaded pre-trained networks in experiment/base_pretrained_nets/ directory to be found by the network module.
You can download the RGB and Optical Flow frames for both UCF-101 and HMDB-51 at the official repository here. You just need to extract the zip files in the dataset directory such that it respect the following directory hierarchy so then the provided dataloader can easily find directories of different categories.
Please make sure the downloaded dataset folders and files sit according to the following structure:
dataset
| | UCF101
| | | images
│ │ │ | ApplyEyeMakeup
│ │ │ | ApplyLipstick
│ │ │ | ...
| | | flows
│ │ │ | ApplyEyeMakeup
│ │ │ | ApplyLipstick
│ │ │ | ...
| | | annotations
| | | | annot01.json
| | | | annot02.json
| | | | annot03.json
| | | | ucf101_splits
| | | | | trainlist01
| | | | | trainlist02
| | | | | ....
You need to create the annotations of each training and test splits using the script provided in the lib/utils/json_ucf.py. They need to be placed in the annotation folder as described above.
This implementation is tested on the following packages:
- Python 3.7
- PyTorch 1.0
- CUDA 9.0
- EasyDict
You can train or test the network by using the "train.py" or "test.pt" as follows.
You can use the tools/train.py to start training the network. If you use --help you will see the list of optional sys arguments that could be passed such as "--use-gpu" and "--gpu-id". You can also have a custom cfg file loaded to customize the reference one if you would not like to change the reference one. Additionally, you can set them one by one once you call "--set".
You can use the tools/test.py to start testing the network by loading a custom network snapshot. You have to pass "--pre-trained-id" and "--pre-trained-epoch" to specify the network id and the epoch the snapshot was taken at.
All of the configuration hyperparameters are set in the lib/utils/config.py. If you want to change them permanently, simply edit the file with the settings you would like to. Otherwise, use the approaches mentioned above to temporary change them.
This section will be updated with preliminary results soon.