Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



1 Commit

Repository files navigation


This project provides modular implementation for state-of-the-art semantic segmentation models based on the MXNet framework and GluonCV toolkit. See MindSeg for a mirror implemented by the HUAWEI MindSpore.

Bright Spots

  • Ease of use and extension pipeline for the semantic segmentation task, including data pre-processing, model definition, network training and evaluation.

  • Parallel training on GPUs.

  • Multiple supported models.

    • Fully Convolutional Networks for Semantic Segmentation [FCN, CVPR2015, paper]
    • Attention to Scale: Scale-Aware Semantic Image Segmentation [Att2Scale, CVPR2016, paper]
    • Rethinking Atrous Convolution for Semantic Image Segmentation [DeepLabv3, arXiv2017, paper]
    • Ladder-Style DenseNets for Semantic Segmentation of Large Natural Images [LadderDensenet, ICCVW2017, paper]
    • Pyramid Scene Parsing Network [PSPNet, CVPR2017, paper]
    • BiSeNet: Bilateral segmentation network for real-time semantic segmentation [BiSeNet, ECCV2018, paper]
    • Encoder-decoder with atrous separable convolution for semantic image segmentation [DeepLabv3+, ECCV2018, paper]
    • DenseASPP for Semantic Segmentation in Street Scenes [DenseASPP, CVPR2018, paper]
    • Towards Bridging Semantic Gap to Improve Semantic Segmentation [SeENet, ICCV2019, paper]
    • ACFNet: Attentional Class Feature Network for Semantic Segmentation [ACFNet, ICCV2019, paper]
    • Dual Attention Network for Scene Segmentation [DANet, CVPR2019, paper]
    • In Defense of Pre-trained ImageNet Architectures for Real-time Semantic Segmentation of Road-driving Images [SwiftNet, CVPR2019, paper]
    • Panoptic Feature Pyramid Networks [SemanticFPN, CVPR2019, paper]
    • Gated Fully Fusion for Semantic Segmentation [GFFNet, AAAI2020, paper]
    • Attention-guided Chained Context Aggregation for Semantic Segmentation [CANetv1, IMAVIS2021, paper]
    • EPRNet: Efficient Pyramid Representation Network for Real-Time Street Scene Segmentation [EPRNet, TITS2021, paper]
    • AttaNet: Attention-Augmented Network for Fast and Accurate Scene Parsing [AttaNet, AAAI2021, paper]
    • An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [ViT, ICLR2021, paper]
    • Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [SETR, CVPR2021, paper]
    • FaPN: Feature-aligned Pyramid Network for Dense Image Prediction [FaPN, ICCV2021, paper]
    • AlignSeg: Feature-Aligned Segmentation Networks [AlignSeg, TPAMI2021, paper]
    • Compensating for Local Ambiguity with Encoder-Decoder in Urban Scene Segmentation [CANetv2, TITS2022, paper]


We note that:

  • OS is output stride of the backbone network.
  • * denotes multi-scale and flipping testing, otherwise single-scale inputs.
  • No whistles and bells are adopted, e.g. OHEM or multi-grid.


Model Backbone OS #Params TrainSet EvalSet mIoU *mIoU
BiSeNet ResNet18 32 13.2M train_fine val 71.6 74.7
BiSeNet ResNet18 32 13.2M trainval_fine test - 74.8
FCN ResNet18 32 12.4M train_fine val 64.9 68.1
FCN ResNet18 8 12.4M train_fine val 68.3 69.9
FCN ResNet50 8 28.4M train_fine val 71.7 -
FCN ResNet101 8 47.5M train_fine val 74.5 -
PSPNet ResNet101 8 56.4M train_fine val 78.2 79.5
DeepLabv3 ResNet101 8 58.9M train_fine val 79.3 80.0
DenseASPP ResNet101 8 69.4M train_fine val 78.7 79.8
DANet ResNet101 8 66.7M train_fine val 79.7 80.9


Model Backbone OS TrainSet EvalSet PA mIoU *PA *mIoU
PSPNet ResNet101 8 train val 80.1 42.9 80.9 43.7

Pascal VOC 2012

Model Backbone OS TrainSet EvalSet PA mIoU *PA *mIoU
FCN ResNet101 8 train_aug val 94.4 74.6 94.5 75.0
Att2Scale ResNet101 8 train_aug val 94.8 77.1 - -
PSPNet ResNet101 8 train_aug val 95.1 78.1 95.3 78.5
DeepLabv3 ResNet101 8 train_aug val 95.5 80.1 95.6 80.4
DeepLabv3+ ResNet101 8 train_aug val 95.5 79.9 95.6 80.1


Model Backbone OS TrainSet EvalSet PA mIoU *PA *mIoU
FCN ResNet101 8 train val 69.2 39.7 70.2 41.0
PSPNet ResNet101 8 train val 71.3 43.0 71.9 43.6
DeepLabv3+ ResNet101 8 train val 73.5 46.0 74.3 47.2


We adopt python 3.6.2 and CUDA 10.1 in this project.

  1. Prerequisites

    pip install -r requirements.txt

    Note that we employ wandb for log and visualization. Refer to here for a QuickStart.

  2. Detail API for Pascal Context dataset



  1. Configure hyper-parameters in ./mxnetseg/config.yml

  2. Run the ./mxnetseg/ script

    python --ctx 0 1 2 3 --wandb wandb-demo
  3. During training, the program will automatically create a sub-folder ./weights/{model_name} to save model checkpoints/parameters.


Simply run the ./mxnetseg/ with arguments need to be specified

python --model FCNResNet --backbone resnet18 --checkpoint fcn_resnet18_Cityscapes_20191900_310600_best.params --ctx 0 --data Cityscapes --crop 768 --base 2048 --mode val --ms

About the mode:

  • val: to get mIoU and PA metrics on the validation set.
  • test: to get colored predictions on the test set.
  • testval: to get colored predictions on the validation set.


Please kindly cite our paper if you feel our codes help in your research.

  title={Attention-guided chained context aggregation for semantic segmentation},
  author={Tang, Quan and Liu, Fagui and Zhang, Tong and Jiang, Jun and Zhang, Yu},
  journal={Image and Vision Computing},

  title={EPRNet: Efficient Pyramid Representation Network for Real-Time Street Scene Segmentation},
  author={Tang, Quan and Liu, Fagui and Jiang, Jun and Zhang, Yu},
  journal={IEEE Transactions on Intelligent Transportation Systems},

  title={Compensating for Local Ambiguity With Encoder-Decoder in Urban Scene Segmentation}, 
  author={Tang, Quan and Liu, Fagui and Zhang, Tong and Jiang, Jun and Zhang, Yu and Zhu, Boyuan and Tang, Xuhao},
  journal={IEEE Transactions on Intelligent Transportation Systems},


No description, website, or topics provided.







No releases published


No packages published
