initial commit

DeNA · Dec 5, 2018 · 386c5d2 · 386c5d2
commit 386c5d2
Show file tree

Hide file tree

Showing 22 changed files with 1,705 additions and 0 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,47 @@
+Copyright (c) 2018 DeNA Co., Ltd.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+ of this software and associated documentation files (the "Software"), to deal 
+in the Software without restriction, including without limitation the rights 
+to use, copy, modify, merge, publish, distribute, and/or sublicense 
+copies of the Software, and to permit persons to whom the Software is 
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all 
+copies or substantial portions of the Software; and
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+
+This software uses some portions from the following software under its license:
+
+chainercv
+
+The MIT License
+
+Copyright (c) 2017 Preferred Networks, Inc.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
diff --git a/README.md b/README.md
@@ -0,0 +1,166 @@
+# YOLOv3 in Pytorch
+Pytorch implementation of YOLOv3
+
+<p align="left"><img src="data/innsbruck_result.png" height="160"\>  <img src="data/mountain_result.png" height="160"\></p>
+
+## What's New
+- 18/11/27 [COCO AP results of darknet (training) are reproduced with the same training conditions](#performance)
+- 18/11/20 verified inference COCO AP[IoU=0.50:0.95] = 0.302 (paper: 0.310), val5k, 416x416  
+- 18/11/20 verified inference COCO AP[IoU=0.50]  = 0.544 (paper: 0.553), val5k, 416x416
+
+## Performance
+
+<table><tbody>
+<tr><th align="left" bgcolor=#f8f8f8> </th>     <td bgcolor=white> Original (darknet) </td><td bgcolor=white> Ours (pytorch) </td></tr>
+<tr><th align="left" bgcolor=#f8f8f8> COCO AP[IoU=0.50:0.95], inference</th> <td bgcolor=white> 0.310 </td><td bgcolor=white> 0.302 </td></tr>
+<tr><th align="left" bgcolor=#f8f8f8> COCO AP[IoU=0.50],      inference</th> <td bgcolor=white> 0.553 </td><td bgcolor=white> 0.544 </td></tr>
+<tr><th align="left" bgcolor=#f8f8f8> COCO AP[IoU=0.50:0.95], training</th> <td bgcolor=white> 0.310 </td><td bgcolor=white> to be updated</td></tr>
+<tr><th align="left" bgcolor=#f8f8f8> COCO AP[IoU=0.50],      training</th> <td bgcolor=white> 0.553 </td><td bgcolor=white> to be updated </td></tr>
+</table></tbody>
+
+We have verified that COCO val results of darknet are reproduced in the condition where only random resizing is used:
+<p align="left"><img src="data/val_comparison.png" height="280"\>
+
+## Installation
+#### Requirements
+
+- Python 3.6+
+- Numpy (verified as operable: 1.15.2)
+- OpenCV
+- Matplotlib
+- Pytorch (verified as operable: v0.4.0)
+- Cython (verified as operable: v0.29.1)
+- [pycocotools](https://pypi.org/project/pycocotools/) (verified as operable: v2.0.0) 
+
+optional:
+- tensorboard (>1.7.0)
+- [tensorboardX](https://github.com/lanpa/tensorboardX)
+
+#### Docker Environment
+
+We provide a Dockerfile to build an environment that meets the above requirements.
+
+```bash
+# build docker image
+$ nvidia-docker build -t yolov3-in-pytorch-image --build-arg UID=`id -u` -f docker/Dockerfile .
+# create docker container and login bash
+$ nvidia-docker run -it -v `pwd`:/work --name yolov3-in-pytorch-container yolov3-in-pytorch-image
+docker@4d69df209f4a:/work$ python train.py --help
+```
+
+#### Download pretrained weights
+download the pretrained file from the author's project page:   
+
+```bash
+$ mkdir weights
+$ cd weights/
+$ bash ../requirements/download_weights.sh
+```
+
+#### COCO 2017 dataset:
+the COCO dataset is downloaded and unzipped by:   
+
+```bash
+$ bash requirements/getcoco.sh
+```
+
+## Inference with Pretrained Weights
+
+To detect objects in the sample image, just run:
+```bash
+$ python demo.py --image data/mountain.png --detect_thresh 0.5 --weights_path weights/yolov3.weights
+```
+## Train
+
+```bash
+$ python train.py --help
+usage: train.py [-h] [--cfg CFG] [--weights_path WEIGHTS_PATH] [--n_cpu N_CPU]
+                [--checkpoint_interval CHECKPOINT_INTERVAL]
+                [--eval_interval EVAL_INTERVAL] [--checkpoint CHECKPOINT]
+                [--checkpoint_dir CHECKPOINT_DIR] [--use_cuda USE_CUDA]
+                [--debug] [--tfboard TFBOARD]
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --cfg CFG             config file. see readme
+  --weights_path WEIGHTS_PATH
+                        darknet weights file
+  --n_cpu N_CPU         number of workers
+  --checkpoint_interval CHECKPOINT_INTERVAL
+                        interval between saving checkpoints
+  --eval_interval EVAL_INTERVAL
+                        interval between evaluations
+  --checkpoint CHECKPOINT
+                        pytorch checkpoint file path
+  --checkpoint_dir CHECKPOINT_DIR
+                        directory where checkpoint files are saved
+  --use_cuda USE_CUDA
+  --debug               debug mode where only one image is trained
+  --tfboard TFBOARD     tensorboard path for logging
+```
+example:   
+```bash
+$ python train.py --weights_path weights/darknet53.conv.74 --tfboard log
+```
+The train configuration is written in yaml files located in config folder.
+We use the following format:
+```yaml
+MODEL:
+  TYPE: YOLOv3
+  BACKBONE: darknet53
+TRAIN:
+  LR: 0.001
+  MOMENTUM: 0.9
+  DECAY: 0.0005
+  BURN_IN: 1000 # duration (iters) for learning rate burn-in
+  MAXITER: 500000
+  STEPS: (400000, 450000) # lr-drop iter points
+  BATCHSIZE: 4 
+  SUBDIVISION: 16 # num of minibatch inner-iterations
+  IMGSIZE: 608 # initial image size
+  CONFWEIGHT: 1 # not used
+  LOSSTYPE: l2 # loss type for w, h
+  IGNORETHRE: 0.7 # IoU threshold for learning conf
+  RANDRESIZE: True # enable random resizing
+TEST:
+  CONFTHRE: 0.8 # not used
+  NMSTHRE: 0.45 # same as official darknet
+  IMGSIZE: 416 # this can be changed to measure acc-speed tradeoff
+NUM_GPUS: 1
+
+```
+
+## Evaluate COCO AP
+
+```bash
+$ python train.py --cfg config/yolov3_eval.cfg --eval_interval 1 [--ckpt ckpt_path] [--weights_path weights_path]
+```
+
+## TODOs
+- [x] Precision Evaluator (bbox, COCO metric)
+- [x] Modify the target builder
+- [x] Modify loss calculation
+- [x] Training Scheduler
+- [x] Weight initialization
+- [x] Augmentation : Resizing
+- [ ] Augmentation : Random Distortion
+- [ ] Augmentation : Jitter
+- [ ] Augmentation : Flip
+
+
+## Paper
+### YOLOv3: An Incremental Improvement
+_Joseph Redmon, Ali Farhadi_ <br>
+
+[[Paper]](https://pjreddie.com/media/files/papers/YOLOv3.pdf) [[Original Implementation]](https://github.com/pjreddie/darknet)
+[[Author's Project Page]](https://pjreddie.com/darknet/yolo/)  
+
+## Credit
+```
+@article{yolov3,
+  title={YOLOv3: An Incremental Improvement},
+  author={Redmon, Joseph and Farhadi, Ali},
+  journal = {arXiv},
+  year={2018}
+}
+```
diff --git a/config/yolov3_default.cfg b/config/yolov3_default.cfg
@@ -0,0 +1,22 @@
+MODEL:
+  TYPE: YOLOv3
+  BACKBONE: darknet53
+TRAIN:
+  LR: 0.001
+  MOMENTUM: 0.9
+  DECAY: 0.0005
+  BURN_IN: 1000
+  MAXITER: 500000
+  STEPS: (400000, 450000)
+  BATCHSIZE: 4
+  SUBDIVISION: 16
+  IMGSIZE: 608
+  CONFWEIGHT: 1
+  LOSSTYPE: l2
+  IGNORETHRE: 0.7
+  RANDRESIZE: True
+TEST:
+  CONFTHRE: 0.8
+  NMSTHRE: 0.45
+  IMGSIZE: 416
+NUM_GPUS: 1
diff --git a/config/yolov3_eval.cfg b/config/yolov3_eval.cfg
@@ -0,0 +1,22 @@
+MODEL:
+  TYPE: YOLOv3
+  BACKBONE: darknet53
+TRAIN:
+  LR: 0.00
+  MOMENTUM: 0.9
+  DECAY: 0.0005
+  BURN_IN: 0
+  MAXITER: 2
+  STEPS: (99, 999)
+  BATCHSIZE: 1
+  SUBDIVISION: 1
+  CONFWEIGHT: 1
+  LOSSTYPE: l2
+  IGNORETHRE: 0.7
+  IMGSIZE: 608
+  RANDRESIZE: False
+TEST:
+  CONFTHRE: 0.8
+  NMSTHRE: 0.45
+  IMGSIZE: 416
+NUM_GPUS: 1
diff --git a/data/innsbruck.png b/data/innsbruck.png
diff --git a/data/innsbruck_result.png b/data/innsbruck_result.png
diff --git a/data/mountain.png b/data/mountain.png
diff --git a/data/mountain_result.png b/data/mountain_result.png
diff --git a/data/val_comparison.png b/data/val_comparison.png
diff --git a/dataset/cocodataset.py b/dataset/cocodataset.py
@@ -0,0 +1,101 @@
+import os
+import numpy as np
+
+import torch
+from torch.utils.data import Dataset
+import cv2
+from pycocotools.coco import COCO
+
+from utils.utils import *
+
+
+class COCODataset(Dataset):
+    """
+    COCO dataset class.
+    """
+    def __init__(self, model_type, data_dir='COCO', json_file='instances_train2017.json',
+                 name='train2017', img_size=416, min_size=1, debug=False):
+        """
+        COCO dataset initialization. Annotation data are read into memory by COCO API.
+        Args:
+            model_type (str): model name specified in config file
+            data_dir (str): dataset root directory
+            json_file (str): COCO json file name
+            name (str): COCO data name (e.g. 'train2017' or 'val2017')
+            img_size (int): target image size after pre-processing
+            min_size (int): bounding boxes smaller than this are ignored
+            debug (bool): if True, only one data id is selected from the dataset
+        """
+        self.data_dir = data_dir
+        self.json_file = json_file
+        self.model_type = model_type
+        self.coco = COCO(self.data_dir+'annotations/'+self.json_file)
+        self.ids = self.coco.getImgIds()
+        if debug:
+            self.ids = self.ids[1:2]
+            print("debug mode...", self.ids)
+        self.class_ids = sorted(self.coco.getCatIds())
+        self.name = name
+        self.max_labels = 50
+        self.img_size = img_size
+        self.min_size = min_size
+
+    def __len__(self):
+        return len(self.ids)
+
+    def __getitem__(self, index):
+        """
+        One image / label pair for the given index is picked up \
+        and pre-processed.
+        Args:
+            index (int): data index
+        Returns:
+            img (numpy.ndarray): pre-processed image
+            padded_labels (torch.Tensor): pre-processed label data. \
+                The shape is :math:`[self.max_labels, 5]`. \
+                each label consists of [class, xc, yc, w, h]:
+                    class (float): class index.
+                    xc, yc (float) : center of bbox whose values range from 0 to 1.
+                    w, h (float) : size of bbox whose values range from 0 to 1.
+            info_img : tuple of h, w, nh, nw, dx, dy.
+                h, w (int): original shape of the image
+                nh, nw (int): shape of the resized image without padding
+                dx, dy (int): pad size
+            id_ (int): same as the input index. Used for evaluation.
+        """
+        id_ = self.ids[index]
+
+        anno_ids = self.coco.getAnnIds(imgIds=[int(id_)], iscrowd=None)
+        annotations = self.coco.loadAnns(anno_ids)
+
+        # load image and preprocess
+        img_file = os.path.join(self.data_dir, self.name,
+                                '{:012}'.format(id_) + '.jpg')
+        img = cv2.imread(img_file)
+
+        if self.json_file == 'instances_val5k.json' and img is None:
+            img_file = os.path.join(self.data_dir, 'train2017',
+                                    '{:012}'.format(id_) + '.jpg')
+            img = cv2.imread(img_file)
+        assert img is not None
+
+        img, info_img = preprocess(img, self.img_size)
+
+        # load labels
+        labels = []
+        for anno in annotations:
+            if anno['bbox'][2] > self.min_size and anno['bbox'][3] > self.min_size:
+                labels.append([])
+                labels[-1].append(self.class_ids.index(anno['category_id']))
+                labels[-1].extend(anno['bbox'])
+
+        padded_labels = np.zeros((self.max_labels, 5))
+        if len(labels) > 0:
+            labels = np.stack(labels)
+            if 'YOLO' in self.model_type:
+                labels = label2yolobox(labels, info_img, self.img_size)
+            padded_labels[range(len(labels))[:self.max_labels]
+                          ] = labels[:self.max_labels]
+        padded_labels = torch.from_numpy(padded_labels)
+
+        return img, padded_labels, info_img, id_