Docker Image
- tensorflow/tensorflow:tensorflow:2.4.0-gpu-jupyter
Library
- Pytorch : Stable (1.7.1) - Linux - Python - CUDA (11.0)
- Using Single GPU (not tested on cpu only)
- model.py : DeconvNet
- train.py : train model
- utils.py : calculate mIoU
- Used train settings of paper as below
- input : (3, 224, 224)
- batch size : 8
- learning rate : 0.01
- momentum : 0.9
- weight decay : 0.0005
- no learning rate scheduler for convenience
- mIoU score may be quite different with paper cause of lack of 2-stage training
- Improve FCN model with overcoming limitations of FCN
- Deep deconvolution network : unpooling + deconvolution
- unpooling captures example-specific structures
- deconvolution captures class-specific shapes
- Instance-wise training : handle object in various scales effectively
- Mean intersection over union
- Convolution network : VGG16 with batchnorm (remove dropout, last fc layer)
- Deconvolution network : symmetric as convolution network (replace pool, conv to unpool, deconv)
- 2-stage training
- learn easy samples first : draw bounding box of instance and extend 1.2 times larger that instance can be in center (other instances are regard as background)
- learn hard samples next : get bounding box using Edge Boxes and extend 1.2 times larger
- Data pre-processing
- resize : (250, 250)
- random crop : (224, 224)
- random horizontal flip
- Objective : per-pixel multinomial logistic loss
- Train Details
- minibatch SGD with momentum
- batch size : 64
- learning rate : 0.01
- momentum : 0.9
- weight decay : 0.0005
- minibatch SGD with momentum
- Data pre-processing
- get 2000 object proposals using Edge Boxes
- select top 50 proposals based on objectness scores
- Aggregate instance-wise segmentation maps
- compute pixel-wise maximum to aggregate and make image prediction
- Ensemble
- DeconvNet + FCN -> EDeconvNet (mean of class conditional probability maps)
- EDeconvNet + CRF (state-of-the-art)