A fast, accurate and diverse object detection pipeline for video written in MXNet and Gluon.
- Train and upload pre-trained models for all datasets and MobileNet backbone
- Add evaluation results for models
- Add temporal processing models
We support training and testing with the following datasets:
Dataset | split | Images (Clips) | Boxes (Obj Instances) | Categories |
---|---|---|---|---|
PascalVOC | trainval 07+12 |
16551 | 47223 | 20 |
PascalVOC | test 07 |
4952 | 14976 | 20 |
MSCoco | train 17 |
117266 | 849901 | 80 |
MSCoco | val 17 |
5000 | 36828 | 80 |
ImageNetDET | train |
456567 | 478806 | 200 |
ImageNetDET | val |
20121 | 55502 | 200 |
ImageNetDET | train_nonempty |
333474 | 478806 | 200 |
ImageNetDET | val_nonempty |
18680 | 55502 | 200 |
ImageNetVID | train15 |
1122397 (3862) | 1731913 (7911) | 30 |
ImageNetVID | val15 |
176126 (555) | 273505 (1309) | 30 |
ImageNetVID | test15 |
315176 (937) | NA | 30 |
ImageNetVID | train15_nonempty |
1086132 (3862) | 1731913 (7911) | 30 |
ImageNetVID | val15_nonempty |
172080 (555) | 273505 (1309) | 30 |
ImageNetVID | train17 |
1181113 (4000) | 1859625 (8394) | 30 |
ImageNetVID | val17 |
512360 (1314) | 795433 (3181) | 30 |
ImageNetVID | test17 |
765631 (2000) | NA | 30 |
ImageNetVID | train17_nonempty |
1142945 (4000) | 1859625 (8394) | 30 |
ImageNetVID | val17_nonempty |
492183 (1314) | 795433 (3181) | 30 |
ImageNetVID | train17_ne_0.04 |
47481 (4000) | 78501 (8682) | 30 |
ImageNetVID | val17_ne_0.04 |
20353 (1314) | 33384 (3295) | 30 |
YouTubeBB | train |
5608012 (301987) | 5608012 (444053) | 23 |
YouTubeBB | val |
625338 (33578) | 625338 (49193) | 23 |
YouTubeBB | train_nonempty |
4580762 (294853) | 4484014 (294715) | 23 |
YouTubeBB | val_nonempty |
508988 (32661) | 497616 (32650) | 23 |
YouTubeBB stats are annotation stats, access to image data yet to be confirmed, will be updated in future
See datasets for downloading and organisation information...
Currently:
See models for downloading and organisation information...
Install youtube-dl using the following command, currently pip install youtube-dl
contains bug that prevents download
of videos
sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
pip install -r requirements.txt
conda env create -f environment.yml
conda activate viddet-mx
To train a model you can use something like:
python train_yolov3.py --dataset voc --gpus 0,1,2,3 --save_prefix 0001 --warmup_epochs 3 --syncbn
If you don't have this much power available you will need to specify a lower batch size (this also will default to one GPU):
python train_yolov3.py --batch_size 4 --dataset voc --save_prefix 0001 --warmup_epochs 3
We found a warmup was necessary for YOLOv3
To finetune a model you need to specify a --resume
path to a
pretrained params model file and specify the --trained_on
dataset,
the model will be finetuned on the dataset specified with --dataset
python train_yolov3.py --dataset voc --trained_on coco --resume models/experiments/0003/yolo3_darknet53_coco_best.params --gpus 0,1,2,3 --save_prefix 0006 --warmup_epochs 3 --syncbn
To evaluate a model you can use something like:
python detect_yolov3.py --batch_size 1 --model_path models/experiments/0001/yolo3_darknet53_voc_best.params --metrics voc --dataset voc --save_prefix 0001
You can also evaluate on different data than the model was trained on (voc trained model on vid set):
python detect_yolov3.py --trained_on voc --batch_size 1 --model_path models/experiments/0001/yolo3_darknet53_voc_best.params --metrics voc,coco,vid --dataset vid --save_prefix 0001
Visualisation is off by default add --visualise
to write out images with boxes displayed.
See models for all of the results