A tiny improvement on SSD (Single Shot Multibox Detector). Using the feature map concatenation module and FPN module on the head of SSD. This project is based on the original project which is implemented in caffe.
In this repository, we proposal to detect the small object effectively by using the concatenation module and FPN module.
I add three extra layers on the head of SSD which is generated by the concatenation module. The concatenation moudle is showd as followed.
The feature-fused layer consists of 512 2H×2W feature maps. The first 128 feature maps are generated by subsampling the 4H×4W feature layer with a 3×3 convolution kernel and activate by the function ReLU. A batch normalization layer is used after subsampling. The main reason is that the features learned by the shallow feature layer and the feature learned by the higher layer have different distributions and gaps. It is difficult to learn and predict. The middle 256 feature maps are generated by dimension reduction and feature combination of the 2H×2W prediction layer through a 3×3 convolution kernel and activate by ReLU. The last 128 feature maps are upsampled by the high-level H×W feature layer through a 2×2 convolution kernel and activate by ReLU activation function. After concatenating the feature maps from three different layers of the feature pyramid, a 3×3 convolution kernel is used to learn the feature-fused maps, in order to eliminate the differences of distribution and gaps.
The overall network is showed as followed.
[1] Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S.: LIN, Tsung-Yi, et al. Feature pyramid networks for object detection. In: CVPR. ( 2017). p. 4.