This repo is an SNN extension of UniFormer, based on which we explored directly-trained spiking neural networks for video action recognition.
First of all, you need to install slowfast. Please follow the installation instructions in INSTALL.md.
Then, you need to install spikingjelly==0.0.0.0.14 for SNN development and training.
Besides, you may follow the instructions in DATASET.md to prepare the datasets.
We release the checkpoints on Baidu Cloud: ckpt (bv4p)
Download them and place them in the corresponding checkpoints folder.
Simply run the training scripts in exp as followed:
bash ./exp/svformer_ucf101_scratch/run.sh[Note]:
-
During training, we follow the SlowFast repository and randomly crop videos for validation. For accurate testing, please follow our testing scripts.
-
For more config details, you can read the comments in
slowfast/config/defaults.py. -
Existed folders in exp contain an example. You can make a new directory for experiment.
We provide testing example as followed:
bash ./exp/svformer_ucf101_scratch/test.shSpecifically, we need to create our new config for testing and run multi-crop/multi-clip test:
-
Copy the training config file
config.yamland create new testing configtest.yaml. -
Change the hyperparameters of data (in
test.yamlortest.sh):DATA: TRAIN_JITTER_SCALES: [224, 224] TEST_CROP_SIZE: 224
-
Set the number of crops and clips (in
test.yamlortest.sh):Multi-clip testing (the numbers can be modified)
TEST.NUM_ENSEMBLE_VIEWS 5 TEST.NUM_SPATIAL_CROPS 3
-
You can also set the checkpoint path via:
TEST.CHECKPOINT_FILE_PATH your_model_path
- Define your SNN model like
./slowfast/models/uniformer2d_psnn_try.py, and others in the same folder. - Make some modifications in
./slowfast/models/__init__.py,./slowfast/models/build.py,./slowfast/config/defaults.py,./tools/train_net.py, etc. (see examples in the corresponding files for existed snn model) - Train and test the model by following the above instructions.
[Note]:
- In this repo,
uniformer2d_psnnis actuallysvformer.
If you find this repository useful, please use the following BibTeX entry for citation.
@inproceedings{yu2024svformer,
title={Svformer: a direct training spiking transformer for efficient video action recognition},
author={Yu, Liutao and Huang, Liwei and Zhou, Chenlin and Zhang, Han and Ma, Zhengyu and Zhou, Huihui and Tian, Yonghong},
booktitle={International Workshop on Human Brain and Artificial Intelligence},
pages={161--180},
year={2024},
organization={Springer}
}This repository is developed based on several repositories: UniFormer, SlowFast, SpikingJelly , syops-counter, and others. Thanks for their efforts.