This document provides basic tutorials for the usage of MMAction. For installation, please refer to INSTALL.md. For data deployment, please refer to DATASET.md.
We first give an example of testing and training action recognition models on UCF101.
First of all, please follow PREPARING_UCF101.md for data preparation.
Reference models are stored in MODEL_ZOO.md.
We download a reference model spatial stream BN-Inception at $MMACTION/modelzoo
using:
wget -c https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/ucf101/tsn_2d_rgb_bninception_seg3_f1s1_b32_g8-98160339.pth -P ./modelzoo/
Then, together with provided configs files, we run the following code to test with multiple GPUs:
./tools/dist_test_recognizer.sh test_configs/TSN/ucf101/tsn_rgb_bninception.py tsn_2d_rgb_bninception_seg3_f1s1_b32_g8-98160339.pth 8
When testing 3D ConvNets, the oversample we used is 10 clips x 3 crops by default. For some extremely large models, it might be difficult for so many samples to fit on 1 GPU. When it happens, you can use commands for heavy test instead:
./tools/dist_test_recognizer_heavy.sh test_configs/CSN\ircsn_kinetics400_se_rgb_r152_seg1_32x2.py ircsn_kinetics400_se_rgb_r152_f32s2_ig65m_fbai-9d6ed879.pth 8 --batch_size=5
To reproduce the model, we provide training scripts as follows:
./tools/dist_train_recognizer.sh configs/TSN/ucf101/tsn_rgb_bninception.py 8 --validate
--validate
: performs evaluation every k (default=1) epochs during the training, which help diagnose training process.
The procedure is not limited to action recognition in UCF101. To perform spatial-temporal detection on AVA, we can train a baseline model by running
./tools/dist_train_detector.sh configs/ava/ava_fast_rcnn_nl_r50_c4_1x_kinetics_pretrain_crop.py 8 --validate
and evaluate a reference model by running
wget -c wget -c https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/ava/fast_rcnn_ava2.1_nl_r50_c4_1x_f32s2_kin-e2495b48.pth -P modelzoo/
python tools/test_detector.py ava_fast_rcnn_nl_r50_c4_1x_kinetics_pretrain_crop.py modelzoo/fast_rcnn_ava2.1_nl_r50_c4_1x_f32s2_kin-e2495b48.pth --out ava_fast_rcnn_nl_r50_multiscale.pkl --gpus 8 --eval bbox
To perform temporal action detection on THUMOS14, we can training a baseline model by running
./tools/dist_train_localizer.sh configs/thumos14/ssn_thumos14_rgb_bn_inception.py 8
and evaluate a reference model by running
wget -c wget -c https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/thumos14/ssn_thumos14_rgb_bn_inception_tag-dac9ddb0.pth -P modelzoo/
python tools/test_detector.py configs/thumos14/ssn_thumos14_rgb_bn_inception.py modelzoo/ssn_thumos14_rgb_bn_inception_tag-dac9ddb0.pth --gpus 8 --out ssn_thumos14_rgb_bn_inception.pkl --eval thumos14
We provide testing scripts to evaluate a whole dataset.
python tools/test_${ARCH}.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [other task-specific arguments]
Arguments:
${ARCH}
could be- "recognizer" for action recognition (TSN, I3D, SlowFast, R(2+1)D, CSN, ...)
- "localizer" for temporal action detection/localization (SSN)
- "detector" for spatial-temporal action detection (a re-implmented Fast-RCNN baseline)
${CONFIG_FILE}
is the config file stored in$MMACTION/test_configs
.${CHECKPOINT_FILE}
is the checkpoint file. Please refer to MODEL_ZOO.md for more details.
MMAction implements distributed training and non-distributed training, powered by the same engine of mmdetection.
Training with multiple GPUs follows the rules below:
./tools/dist_train_${ARCH}.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
- ${ARCH} could be
- "recognizer" for action recognition (TSN, I3D, ...)
- "localizer" for temporal action detection/localization (SSN)
- "detector" for spatial-temporal action detection (a re-implmented Fast-RCNN baseline)
- ${CONFIG_FILE} is the config file stored in
$MMACTION/configs
. - ${GPU_NUM} is the number of GPU (default: 8). If you are using number other than 8, please adjust the learning rate in the config file linearly.