Skip to content

pooruss/For-align-ICNet-paddle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

This repo contains ICNet implemented by Paddlepaddle2.2.0, based on paper by Hengshuang Zhao, and et. al(ECCV'18), code by liminn. Training and evaluation are done on the Cityscapes dataset by default.

Requirements

Python 3.7 or later with the following pip3 install -r requirements.txt:

  • paddlepaddle-gpu==2.2.0rc0
  • numpy==1.17.0
  • Pillow==6.0.0
  • PyYAML==5.1.2

Align

  • All the logs and related files can be found in the align/, including forward, metric, loss and backward. To see the diff logs directly, in the align/diff_txt.
  • Run models/icnet.py to check the all aligning steps and generate relevant files. Then put the .npy files in the align/forward/align/backward/align/metric/ respectively, and run align/check_log_diff.py .
  • Train log in log/icnet_resnet50-v1s_log.txt

Performance

  • Base on Cityscapes dataset, only train on trainning set, and test on validation set, using only one Tesla V100 card on aistudio platform, and input size of the test phase is 2048x1024x3.
Method Pretrained model mIoU(%) GPU Time(train)
ICNet(paper) PSPNet50-half 67.7% TitanX 80+h
ICNet(paddle) Resnet50-paddle(4hrm) 66.7% Tesla V100 24h
ICNet(paddle) Resnet50-v1s-paddle(4ug1) 69.6% Tesla V100 20h
  • The evaluating log is in log/icnet_resnet50_evaluate_log.txt .

Demo

image predict
- All the input images comes from the validation dataset of the Cityscaps, you can switch to the demo/ directory to check more demo results.

Usage

Preparation

Pretrained models:

Trainning

First, make sure the pretrained model resnet50_v1s.pdparams exist in the ICNet-paddle/ .

Then, modify the configuration in the configs/icnet.yaml file:

### Trainning 
train:
  specific_gpu_num: "1"   # for example: "0", "1" or "0, 1"
  train_batch_size: 7    # adjust according to gpu resources
  cityscapes_root: "./data/Cityscapes/" 
  ckpt_dir: "./ckpt/"     # ckpt and trainning log will be saved here

And run : python train.py

Evaluation

First, modify the configuration in the configs/icnet.yaml file:

### Test
test:
  ckpt_path: "./ckpt/icnet_resnet50_126_0.701746612727642_best_model.pdparams"  # set the evaluate model path correctly

Then, download the model from:

and put the .pdparams file in ckpt/.

Last : python evaluate.py

Discussion

Network Structure

The structure of ICNet is mainly composed of sub4, sub2, sub1 and head:

  • sub4: basically a pspnet, the biggest difference is a modified pyramid pooling module.

  • sub2: the first three phases convolutional layers of sub4, sub2 and sub4 share these three phases convolutional layers.

  • sub1: three consecutive stried convolutional layers, to fastly downsample the original large-size input images

  • head: through the CFF module, the outputs of the three cascaded branches( sub4, sub2 and sub1) are connected. Finaly, using 1x1 convolution and interpolation to get the output.

Issues

  • During the training, I found some issues. Paddlepaddle-2.1.2 does not support constructing optimizers which can specify sublayers' learning rates or other parameters. After updating to paddlepaddle-2.2.0rc, the problem is solved.
  • Pretained model Resnet50-v1s is important, and performs better than Resnet50 by 3-4%. Since the Resnet50v1s.pth is not accessible, I transformed the available r50-v1s-pretrained-model.params using mxnet framework to r50v1s-paddle.pdparams.

Tricks

Data preprocessing: set the crop_size as close as possible to the input size of prediction phase. Here are some experiments based on liminn-ICNet-pytorch :

  • base_size to 520, it means resize the shorter side of image between 520x0.5 and 520x2, and set the crop size to 480, it means randomly crop 480x480 patch to train. The final best mIoU is 66.3%. ( Resnet50 )
  • base_size to 1024, it means resize the shorter side of image between 1024x0.5 and 1024x2, and set the crop_size to 960, it means randomly crop 960x960 patch to train. The final best mIoU is 66.7%. ( Resnet50 )
  • base_size to 1024, it means resize the shorter side of image between 1024x0.5 and 1024x2, and set the crop_size to 960, it means randomly crop 960x960 patch to train. The final best mIoU is 69.6%. ( Resnet50v1s )
  • Beacuse the target dataset is Cityscapes, the image size is 2048x1024, so a large crop_size( 960x960 ) is better. It is believed that larger crop_size will bring higher mIoU, but large crop_size ( such as 1024x1024 ) will result in a smaller batch size and is very time-consuming.
  • set the learning rate of sub4 to orginal initial learning rate(0.01), because it has backbone pretrained weights.
  • set the learning rate of sub1 and head to 10 times initial learning rate(0.1), because there are no pretrained weights for them.

Further works

  • For experiments in paddle, there are further jobs to do, such as using crop_size1024 to see how far can data preprocessing improve the model's performance.
  • Switch the pretrained model to PSPNet50 to see if the mIoU reach 67.7% as it is mentioned in the paper.

Reference

About

Reproduction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages