This repository contains code for TransNet V2: An effective deep network architecture for fast shot transition detection.
Our reevaluation of other publicly available state-of-the-art shot boundary methods (F1 scores):
Model | ClipShots | BBC Planet Earth | RAI |
---|---|---|---|
TransNet V2 (this repo) | 77.9 | 96.2 | 93.9 |
TransNet (github) | 73.5 | 92.9 | 94.3 |
Hassanien et al. (github) | 75.9 | 92.6 | 93.9 |
Tang et al., ResNet baseline (github) | 76.1 | 89.3 | 92.8 |
➡️ See inference folder and its README file. ⬅️
See inference-pytorch folder and its README file.
Note the datasets for training are tens of gigabytes in size, hundreds of gigabytes when exported.
You do not need to train the network, use code and instructions in inference folder to detect shots in your videos.
This repository contains all that is needed to run any experiment for TransNet V2 network including network training and dataset creation. All experiments should be runnable in this NVIDIA DOCKER file.
In general these steps need to be done in order to replicate our work (in training folder):
- Download RAI and BBC Planet Earth test datasets (link). Download ClipShots train/test dataset (link). Optionally get IACC.3 dataset.
- Edit and run
consolidate_datasets.py
in order to transform ground truth from all the datasets into one common format. - Take some videos from ClipShotsTrain aside as a validation dataset.
- Run
create_dataset.py
to create all train/validation/test datasets. - Run
training.py ../configs/transnetv2.gin
to train a model. - Run
evaluate.py /path/to/run_log_dir epoch_no /path/to/test_dataset
for proper evaluation.
If found useful, please cite us;)
-
This paper: TransNet V2: An effective deep network architecture for fast shot transition detection
@article{soucek2020transnetv2, title={TransNet V2: An effective deep network architecture for fast shot transition detection}, author={Sou{\v{c}}ek, Tom{\'a}{\v{s}} and Loko{\v{c}}, Jakub}, year={2020}, journal={arXiv preprint arXiv:2008.04838}, }
-
ACM Multimedia paper of the older version: A Framework for Effective Known-item Search in Video
-
The older version paper: TransNet: A deep network for fast detection of common shot transitions