- This YOLOv2 based API is a robust, consistent and fastest solution to train your own object detector with your own custom dataset from scratch including annotating the data.
- Also it delivers the fastest train and detect time speeds for PyTorch as well.
- Use this API if you want to train your object detector on your own custom data and classes from ground up.
NOTE : This repo was a port of YOLOv2 on Pytorch with some further additions, however with newer versions of YOLO available and v5 being directly available on PyTorch, this repo is no longer maintained.
[Additions] :
- Pytorch 0.4
- Improved LMDB DataLoading Algo
- Freezeout
- Snapshot Ensemble
- Ability to add custom classification nets as network backbone from cfg files
- Half Precision Support
- Implementing LMDB in an efficient way.
- Matlab Script to export annotations into JSON
- Custom Dataset in Pytorch(including custom collate fxn)
- Multithreading
- Applying transfomations to Image as well as Annotations
- Training Regime
- Speeding up Training Regime
- TensorBoard
- Adding log,load and save consistency over multiple pause and plays.
- Only maintaining 'n' no of checkpoints
- Transfer Learning
- Eval Code
- DataParallel Issue
- Detection Pipeline using OpenCV
- Support for PyTorch 0.4
- A port to Python3
- Add a parser function so that it can parse different flavors of YOLO using cfg files
- Add a subparser function so the base DarkNet can be replaced with other classification nets such as MobileNets, Squeezenets etc.
- Replace SGD with AdamW so the training speed increases even more
- Add support for standard datasets such as COCO and VOC
- Add options for several pruning methods.
-
Well, at my place, I have very limited resources and unaffordable clouds due to currency conversion, time is a very precious commodity.
-
All I could get my hands on was a K40 with time restrictions per day, so I made this API keeping those things in mind(i.e. consistency in Pause/Play to train it over multiple days, train it in as less time as possible etc).
-
Since then, I've learned and implemented a lot more techniques in other projects that can make this API even faster unfortunately I really don't have time and resources to integrate them into this one.
- Use this API if you want to make your own dataset and train the pretrained YOLOv2 over it.
- The basic functions of the code were inspired from https://github.com/longcw/yolo2-pytorch which is still an experimental project and wasn't achieving mAP as per the orignal YOLOv2 implementation.
- What I found impressive was that the postprocessing functions and custom modules were implemented in Cython, using them certainly gave a headstart, plus I improved it even more so now this API is 700% faster than longcw's and is as accurate as the orignal implementation.
- The LMDB database offers the fastest read speeds, is memory-mapped and corruption proof.
- This is the most efficient way to integrate LMDB with PyTorch that you'll ever find on web (And an even more efficient technique is on it's way :D)
- I tried and tested each and every bounding box annotator available out there(For Linux), Matlab's ImageLabeller was hands-down the fastest, most robust tool.
- Sad that it was usable in Matlab only and had a complex structure efficient to Matlab only.
- Well sad no more. This API will extract and convert all those annotations into a JSON file in the blink of an eye.
- I can't stress this enough, it's extremely weary to stare at terminal or plotting different graphs by the code and maintaining proper consistency, with TensorBoard, never again.
- Once setup, the whole training just needs one single command and one optional flag. The API takes care of consistency (whether it's a new experiment or an old one, therefore where to save the TFlogs, checkpoints)
- Python2.7
- Cython 0.27.2
- OpenCV 3.3
- h5py
- TensorboardX
- LMDB
- Matlab R2017b
- PyTorch 0.3
- Clone this repository
https://github.com/saranshkarira/pytorch-yolov2-fastest.git
- Open
make.sh
nano make.sh
- Replace sm_35 @ -arch=sm_35 to the sm of your own GPU (
Refer
) - Build the reorg layer (
tf.extract_image_patches
)cd pytorch-yolov2-fastest ./make.sh
- Make models and db directory
mkdir -p models db
- Download the trained model yolo-voc.weights.h5 and put it in the models/ directory.
- Annotate Images using the Matlab ImageLabeller Tool.
- Export the session as .mat file.
- You can scale this to multiple people on multiple machines for a larger dataset, then export all these sessions as .mat file.
- Put all these files in mat_to_json folder and run the encoder.m script.
- You will have a single, nice and lightweight JSON file containing all the annotations.
This step will pack your image dataset into a nice LMDB database file.
- Put all the images in
./db/raw/
- There might be some images in the dataset that you did not annotate because object was vague or simply not there.
- Don't worry, put the annotation file(JSON) in
./db/targets
and now it will only pack images that you annotated. - You can get away with keeping the targets folder empty, in which case it will pack all the images in raw folder but you'll have to put JSON in targets anyway so I'd say do this before.
- Run
python2.7 cursor_putting.py
- LMDB files will be generated in
.db/image_data/
directory. - Now, for the final step, open
.cfgs/config_voc.py
and replace VOC classes with your custom classes.
- To adjust your hyper-parameters, refer to this file:
.cfgs/exps/darknet19_exp1.py
Now, once everything is setup, you can train the model by:
- If it's the first time, you'll need to activate Transfer Learning, and the model will be loaded from the pretrained model.
python2.7 trainer.py -tl True
- Now everytime after that, when you stop the training and want to resume it from the latest checkpoint.
python2.7 trainer.py
Note : No need to worry about number of checkpoints, after every few epochs, old ones will be pruned.
Run
tensorboard --logdir .models/logs
This is a trainer API, after the training is complete, you can use the .h5 file in various available YOLOv2 detection APIs and it will work all the same, especially longcw's or you can contribute or wait until a detection pipeline is made as well.
For contributors only I believe, a good navigation goes a long way and speeds up the development process of a project. The API is structured in the following way:
-
All the configuration files are in
cfg/
directory, general functions and settings(directory locations) are saved inconfig.py
, Dataset specific settings(Anchors, class_names) are saved inconfig_voc.py
, Experiment specific settings(hyper-parameters) are saved inexps/
-
layer/
contains custom modules written in cython that are compiled using make.sh script. -
mat_to_json
contains script to convert and export .mat annotation file into json. -
db/
contains 3 folders,raw
,image_data
andtargets
, for raw images, lmdb data and annotations respectively, basically all the dataset related stuff. -
models
contains model related stuff, training checkpoints, pretrained model and Tensorboard logs. -
utils
contains misc functions such as NMS, converting annotations to grid respective, saving and loading weights. -
cursor_putting.py converts raw_images into lmdb database.
-
dataset.py is a custom pytorch dataset class with custom collate function. Also some eval code.
-
darknet.py has the model definition.
-
loss.py calculates the loss.
-
trainer.py is the main/central file that uses all the above files and trains the model.