Skip to content

rainkissthesun/yolo-v2-pytorch

 
 

Repository files navigation

YOLO v2 PyTorch Implementation

I wrote this repo for the purpose of learning, aimed to reproduce YOLO v2 using PyTorch. Most of the ideas were adopted from the original paper, but ... it is extremely difficult to decrypt the mysterious code of Darknet, so ... I crafted my own version on some designs. Indeed it is extremely difficult to decrypt the mysterious code of Darknet, with the help of Alexey's answers (links will be given in reference section), I managed to understand how things were going on for the modifications of different versions of YOLO.

As normal, I do not have the condition to pretrain myself, so I modified the architecture and mainly focused on the implementation of ideas and loss of YOLO.

Most hyperparameters are adapted from the original implementation of Darknet.

Besides, compared with the YOLO v1 I implemented, I believed this version is pretty nice and the code is far more organized, and the efficiency of mAP calculation is improved tremendously, I've made it possible to calculate mAP each epoch with considerable time cost. Here comes the project structure.

.
├── kmeans.py             # script to calculate prior box on VOC train dataset
├── train_yolov2.py       # train yolo v2 Darknet19 (w/o pretrain)
├── train_resnet18.py     # train yolo v2 ResNet18 (w/ pretrain)
├── test_resnet18.py      # test yolo v2 ResNet18 (w/ pretrain)
├── train_resnet101.py    # train yolo v2 ResNet101 (w/ pretrain)
├── test_resnet101.py     # test yolo v2 ResNet101 (w/ pretrain)
├── utils                 # utils
│   ├── __init__.py
│   ├── data.py           # data pipeline, augmentation
│   ├── globalvar.py      # global variable
│   ├── metrics.py        # mAP calculation
│   ├── utils.py          # utils
│   ├── visualize.py      # visualization
│   └── winit.py          # weight init
└── yolo                  # YOLO model related
    ├── __init__.py
    ├── converter.py      # data converter, BBox <=> model output
    ├── loss.py           # YOLO loss module
    ├── model.py          # network
    ├── nms.py            # non-maximum suppression
    └── train.py          # trainer

Note: in this repo I no longer use Jupyter Notebook as it is heavy and not easy to manipulate. From now on, the data visualization are implemented through tensorboard. The default log dir is ./logs

Tensorboard installation:

pip install tensorboard

Check tensorboard:

tensorboard --logdir ./logs

YOLO v2 Features

  • ✅: Implemented and used
  • ❌: Not implemented
  • *: Not available for other backbones
Tricks in Paper Used Additional Tricks Used
BatchNorm Gradient Accumulation
Hi-Res Classifier ResNet Backbone with Passthrough
Convolutional Global Responsible in Loss
Anchor Box
New Network ✅*
Dimension Prior
Location Prediction
Passthrough
Multi-scale
Hi-Res Detector

YOLO v2 Loss

Loss function of YOLO v2 was not given explicitly from the paper. I've tried my best to read the source code of Darknet... And, the good news is, here it is!

Here are some explanation.

Furthermore, the loss implementation in this repo has solved the problem of multiple instance in one single cell by calculation global IoU and performing global responsible object selection algorithm.

[2022/05/25 Update] When implementing YOLO v3 and checking YOLO v2 code today, I discovered that the coordinate loss I implemented was the YOLO v3 version. That is, I multiplied the coordinate loss by (2 - truth.w * truth.h). Also, today I discovered that the global responsible trick I applied in this version of code is indeed not that efficient, as it will miscalculate when responsible cell and ground truth is not in the same cell and even cost large amount of GPU memory. With this version of loss, I literally cannot train the v3 network anymore since the multi-head design needs times more memory, which let me bumping into OOM problem. Anyway, I preserve the current design. If you want to see the newer design, please check my YOLO v3 implementation repo.

Performance

Model Size Backbone mAP@VOC2012-val COCOmAP@VOC2012-val FPS@RTX2070s
YOLOv1-ResNet18 (Ours) 416x416 ResNet18 48.10% 23.18% 97.88
YOLOv1-ResNet50 (Ours) 416x416 ResNet50 49.87% 23.95% 58.40
YOLOv2-ResNet18 (Ours) 544x544 ResNet18 56.26% 27.12% 76.83
YOLOv2-ResNet101 (Ours) 544x544 ResNet101 64.40% 35.06% 28.92
Model Size Backbone mAP@VOC2012-test FPS
YOLOv1-ResNet18 (Ours) 416x416 ResNet18 44.54% 97.88
YOLOv1-ResNet50 (Ours) 416x416 ResNet50 47.28% 58.40
YOLOv2-ResNet18 (Ours) 544x544 ResNet18 56.11% 76.83
YOLOv2-ResNet101 (Ours) 544x544 ResNet101 64.06% 28.92
YOLOv1 416x416 Darknet? 57.9% 45
YOLOv2 544 544x544 Darknet19 73.4% 40

Leaderboard Link:

More comparison across categories:

Model mean aero plane bicycle bird boat bottle bus car cat chair cow
YOLO 57.9 77.0 67.2 57.7 38.3 22.7 68.3 55.9 81.4 36.2 60.8
YOLOv2 75.4 86.6 85.0 76.8 61.1 55.5 81.2 78.2 91.8 56.8 79.6
YOLOv1-ResNet18 (Ours) 44.5 64.3 54.2 47.4 26.8 16.6 55.4 44.3 66.5 23.1 38.1
YOLOv1-ResNet50 (Ours) 47.3 66.7 56.1 49.5 25.9 17.8 60.2 45.9 70.6 26.1 43.0
YOLOv2-ResNet18 (Ours) 56.1 74.3 66.4 59.4 37.0 34.4 65.1 63.3 74.4 38.5 53.3
YOLOv2-ResNet101 (Ours) 64.1 80.2 71.8 67.7 50.5 45.3 72.3 71.9 79.6 45.5 61.9
Model dining
table
dog horse motor
bike
person potted
plant
sheep sofa train tv
monitor
YOLO 48.5 77.2 72.3 71.3 63.5 28.9 52.2 54.8 73.9 50.8
YOLOv2 61.7 89.7 86.0 85.0 84.2 51.2 79.4 62.9 84.9 71.0
YOLOv1-ResNet18 (Ours) 38.5 62.9 57.6 60.8 45.0 15.2 33.3 43.9 60.0 37.2
YOLOv1-ResNet50 (Ours) 41.1 67.5 59.2 62.4 47.6 17.6 35.6 45.7 64.6 42.4
YOLOv2-ResNet18 (Ours) 40.9 68.4 61.7 68.0 68.9 30.2 51.7 47.7 66.7 52.0
YOLOv2-ResNet101 (Ours) 47.6 77.1 66.6 75.1 75.4 42.4 63.3 55.6 73.7 58.0

2022/05/18 Update: I fixed a bug in the NMS (Non-maximum suppression) module of the model, and increased the mAP astonishingly for more than 7%. However here also comes the sad news. The model selection done in the current stage is based on the previous buggy code. As a student with poor experiment conditions, the parameter adjustion and model selection cost me for more than a week, so ... no new trained weights would be released, and I will put my efforts on YOLO v3 and later models.

About Dimension Prior

It is known that in YOLO v2 and YOLO v3, anchor boxes were used. Here we implemented the k-means algorithm for finding dimension priors in kmeans.py, and the graph above are 5 priors we obtained by identifying k = 5.

If you want to run it by yourself, simply try

python kmeans.py

Training

ResNet18

python train_resnet18.py

ResNet101

python train_resnet101.py

Note: similar to my YOLO v1 implementation, you should add download=True to load_data_voc in the training scripts for the first time if the datasets are not ready.

Here are some diagrams.

Testing

Weights are already released. Download and put into ./model.

Here are the trained weights list provided.

File Name Size Backbone Final Type Note
resnet18-pretrained-sgd-epoch-160.pth 544x544 ResNet18 Model
resnet101-pretrained-sgd-17x17-epoch-160.pth 544x544 ResNet101 Model
resnet101-pretrained-sgd-multi-epoch-150-optim.pth Multi-scale ResNet101 Optim Used for further training
resnet101-pretrained-sgd-multi-epoch-150-model.pth Multi-scale ResNet101 Model Used for further training

ResNet18

python test_resnet18.py

ResNet 101

python test_resnet101.py

Webcam Demo

python webcam.py

For VOC2012 test dataset, you can use voc2012test.py to generate test results. Evaluation scores are also published in README. If you want to test it by yourself, please place VOC2012 test data in the current folder like below.

.
yolo-v2-pytorch             # project folder
├── ...                     # Other files
└── README.md
data
└── VOC2012test             # create dataset folder
    └── VOCdevkit
        └── VOC2012
            ├── Annotations
            ├── ImageSets
            └── JPEGImages

VOC2012 test dataset download link:

Troubleshooting

The following problem may caused by lack of memory. Try setting S, batch_size smaller or crop bigger.

Traceback (most recent call last):
  File "C:\Users\JeffersonQin\Desktop\yolo-v2-pytorch\train_resnet18.py", line 59, in <module>
    train(detector, train_iter, test_iter, num_epoch, multi_scale_epoch, output_scale_S, lr, optimizer, 'resnet18-pretrained-sgd', loss, 1, accum_batch_num, './model', './model/resnet18-pretrained-sgd-model-86.pth', './model/resnet18-pretrained-sgd-optim-86.pth', 86)
  File "C:\Users\JeffersonQin\Desktop\yolo-v2-pytorch\yolo\train.py", line 123, in train
    loss_val.sum().backward()
  File "C:\Users\JeffersonQin\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "C:\Users\JeffersonQin\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

Example of setting S smaller:

G.set('S', 17)

Here you can optimizing crop strategy:

Tuning batch_size and accum_batch_num:

If you have any problems or questions, feel free to ask and open issues.

Thanks

About

⚗ YOLO v2 PyTorch Implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%