This repo contains ICNet implemented by Paddlepaddle2.2.0, based on paper by Hengshuang Zhao, and et. al(ECCV'18), code by liminn. Training and evaluation are done on the Cityscapes dataset by default.
Python 3.7 or later with the following pip3 install -r requirements.txt
:
- paddlepaddle-gpu==2.2.0rc0
- numpy==1.17.0
- Pillow==6.0.0
- PyYAML==5.1.2
- All the logs and related files can be found in the
align/
, including forward, metric, loss and backward. To see the diff logs directly, in thealign/diff_txt
. - Run
models/icnet.py
to check the all aligning steps and generate relevant files. Then put the .npy files in thealign/forward/
、align/backward/
、align/metric/
respectively, and runalign/check_log_diff.py
. - Train log in
log/icnet_resnet50-v1s_log.txt
- Base on Cityscapes dataset, only train on trainning set, and test on validation set, using only one Tesla V100 card on aistudio platform, and input size of the test phase is 2048x1024x3.
Method | Pretrained model | mIoU(%) | GPU | Time(train) |
---|---|---|---|---|
ICNet(paper) | PSPNet50-half | 67.7% | TitanX | 80+h |
ICNet(paddle) | Resnet50-paddle(4hrm) | 66.7% | Tesla V100 | 24h |
ICNet(paddle) | Resnet50-v1s-paddle(4ug1) | 69.6% | Tesla V100 | 20h |
- The evaluating log is in
log/icnet_resnet50_evaluate_log.txt
.
image | predict |
---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
- All the input images comes from the validation dataset of the Cityscaps, you can switch to the demo/ directory to check more demo results. |
Pretrained models:
- Resnet50-v1s-mxnet
- Resnet50-paddle (transformed from torch): code is 4hrm
- Resnet50-v1s-paddle (transformed from mxnet): code is 4ug1
First, make sure the pretrained model resnet50_v1s.pdparams exist in the ICNet-paddle/
.
Then, modify the configuration in the configs/icnet.yaml
file:
### Trainning
train:
specific_gpu_num: "1" # for example: "0", "1" or "0, 1"
train_batch_size: 7 # adjust according to gpu resources
cityscapes_root: "./data/Cityscapes/"
ckpt_dir: "./ckpt/" # ckpt and trainning log will be saved here
And run : python train.py
First, modify the configuration in the configs/icnet.yaml
file:
### Test
test:
ckpt_path: "./ckpt/icnet_resnet50_126_0.701746612727642_best_model.pdparams" # set the evaluate model path correctly
Then, download the model from:
- mIoU66.7%-paddle : code is uqcd
- mIoU69.6%-paddle : code is 58kq
and put the .pdparams file in ckpt/
.
Last : python evaluate.py
The structure of ICNet is mainly composed of sub4
, sub2
, sub1
and head
:
-
sub4
: basically apspnet
, the biggest difference is a modifiedpyramid pooling module
. -
sub2
: the first three phases convolutional layers ofsub4
,sub2
andsub4
share these three phases convolutional layers. -
sub1
: three consecutive stried convolutional layers, to fastly downsample the original large-size input images -
head
: through theCFF
module, the outputs of the three cascaded branches(sub4
,sub2
andsub1
) are connected. Finaly, using 1x1 convolution and interpolation to get the output.
- During the training, I found some issues. Paddlepaddle-2.1.2 does not support constructing optimizers which can specify sublayers' learning rates or other parameters. After updating to paddlepaddle-2.2.0rc, the problem is solved.
- Pretained model Resnet50-v1s is important, and performs better than Resnet50 by 3-4%. Since the Resnet50v1s.pth is not accessible, I transformed the available r50-v1s-pretrained-model.params using mxnet framework to r50v1s-paddle.pdparams.
Data preprocessing: set the crop_size
as close as possible to the input size of prediction phase. Here are some experiments based on liminn-ICNet-pytorch :
base_size
to 520, it means resize the shorter side of image between 520x0.5 and 520x2, and set thecrop size
to 480, it means randomly crop 480x480 patch to train. The final best mIoU is 66.3%. ( Resnet50 )base_size
to 1024, it means resize the shorter side of image between 1024x0.5 and 1024x2, and set thecrop_size
to 960, it means randomly crop 960x960 patch to train. The final best mIoU is 66.7%. ( Resnet50 )base_size
to 1024, it means resize the shorter side of image between 1024x0.5 and 1024x2, and set thecrop_size
to 960, it means randomly crop 960x960 patch to train. The final best mIoU is 69.6%. ( Resnet50v1s )- Beacuse the target dataset is Cityscapes, the image size is 2048x1024, so a large
crop_size
( 960x960 ) is better. It is believed that largercrop_size
will bring higher mIoU, but largecrop_size
( such as 1024x1024 ) will result in a smaller batch size and is very time-consuming. - set the learning rate of
sub4
to orginal initial learning rate(0.01), because it has backbone pretrained weights. - set the learning rate of
sub1
andhead
to 10 times initial learning rate(0.1), because there are no pretrained weights for them.
- For experiments in paddle, there are further jobs to do, such as using
crop_size
1024 to see how far can data preprocessing improve the model's performance. - Switch the pretrained model to PSPNet50 to see if the mIoU reach 67.7% as it is mentioned in the paper.