[Feature] Add multi-label semantic segmentation support (#3479)

* 添加UWMGI数据集的转换脚本 * 修改Dataset和Compose op使其适配读取多标签数据的情况 * 添加对多标签模式下的推理结果的可视化支持 * 添加对多标签模式下的语义分割任务评估指标的支持 * 添加对多标签模式下，传入--use_multilabel参数的支持 * 添加多标签语义分割任务在UWMGI数据集上的实例配置文件和说明文档 * 添加多标签语义分割任务的辅助类transform op * 更新数据增强策略，加快收敛 * 添加使用辅助类transform op的配置文件 * 更新脚本，使其支持`UWMGI` 和主流的COCO类型标注转换为ppseg dataset api支持的格式 * 更新图片和转换脚本的相关命令
PaddlePaddle · Sep 22, 2023 · 63f95e6 · 63f95e6
1 parent 1b9574e
commit 63f95e6
Show file tree

Hide file tree

Showing 18 changed files with 893 additions and 50 deletions.
diff --git a/configs/_base_/uwmgi.yml b/configs/_base_/uwmgi.yml
@@ -0,0 +1,54 @@
+batch_size: 8
+iters: 160000
+
+train_dataset:
+  type: Dataset
+  dataset_root: data/UWMGI
+  transforms:
+    - type: Resize
+      target_size: [256, 256]
+    - type: RandomHorizontalFlip
+    - type: RandomVerticalFlip
+    - type: RandomDistort
+      brightness_range: 0.4
+      contrast_range: 0.4
+      saturation_range: 0.4
+    - type: Normalize
+      mean: [0.0, 0.0, 0.0]
+      std: [1.0, 1.0, 1.0]
+  num_classes: 3
+  train_path: data/UWMGI/train.txt
+  mode: train
+
+val_dataset:
+  type: Dataset
+  dataset_root: data/UWMGI
+  transforms:
+    - type: Resize
+      target_size: [256, 256]
+    - type: Normalize
+      mean: [0.0, 0.0, 0.0]
+      std: [1.0, 1.0, 1.0]
+  num_classes: 3
+  val_path: data/UWMGI/val.txt
+  mode: val
+
+optimizer:
+  type: SGD
+  momentum: 0.9
+  weight_decay: 4.0e-5
+
+lr_scheduler:
+  type: PolynomialDecay
+  learning_rate: 0.001
+  end_lr: 0
+  power: 0.9
+
+loss:
+  types:
+    - type: MixedLoss
+      losses:
+        - type: BCELoss
+        - type: LovaszHingeLoss
+      coef: [0.5, 0.5]
+  coef: [1]
diff --git a/configs/multilabelseg/README.md b/configs/multilabelseg/README.md
@@ -0,0 +1,139 @@
+English | [简体中文](README_cn.md)
+
+# Multi-label semantic segmentation based on PaddleSeg
+
+## 1. introduction
+
+Multi-label semantic segmentation is an image segmentation task that aims to assign each pixel in an image to multiple categories, rather than just one category. This can better express complex information in the image, such as overlapping, occlusion, boundaries, etc. of different objects. Multi label semantic segmentation has many application scenarios, such as medical image analysis, remote sensing image interpretation, autonomous driving, and so on.
+
+<p align="center">
+<img src="https://github.com/PaddlePaddle/PaddleSeg/assets/95759947/ea6bb360-75de-4e06-9910-44c7d2fdbe6c">
+<img src="https://github.com/PaddlePaddle/PaddleSeg/assets/95759947/e2781865-db7e-4f46-98b2-3ef731e8bef1">
+<img src="https://github.com/PaddlePaddle/PaddleSeg/assets/95759947/9e587935-fd6f-459e-b798-0164eb98f44d">
+</p>
+
++ *The above effect shows the inference results obtained from the model trained using images in the [UWMGI](https://www.kaggle.com/competitions/uw-madison-gi-tract-image-segmentation/) dataset*
+
+## 2. Supported models and loss functions
+
+|                                            Model                                            |           Loss           |
+|:-------------------------------------------------------------------------------------------:|:------------------------:|
+| DeepLabV3, DeepLabV3P, MobileSeg, <br/>PP-LiteSeg, PP-MobileSeg, UNet, <br/>Unet++, Unet+++ | BCELoss, LovaszHingeLoss |
+
++ *The above are the confirmed supported models and loss functions, with a larger actual support range.*
+
+## 3. Sample Tutorial
+
+The following will take the **[UWMGI](https://www.kaggle.com/competitions/uw-madison-gi-tract-image-segmentation/)** multi-label semantic segmentation dataset and the **[PP-MobileSeg](../pp_mobileseg/README.md)** model as examples.
+
+### 3.1 Data Preparation
+In the single label semantic segmentation task, the shape of the annotated grayscale image is **(img_h, img_w)**, and the index value of the category is represented by grayscale values.
+
+In the multi-label semantic segmentation task, the shape of the annotated grayscale image is **(img_h, num_classes x img_w)**, which means that the corresponding binary annotations of each category are sequentially concatenated in the horizontal direction.
+
+Download the raw data compression package of the UWMGI dataset and convert it to a format supported by PaddleSeg's [Dataset](../../paddleseg/datasets/dataset.py) API using the provided script.
+```shell
+wget https://storage.googleapis.com/kaggle-competitions-data/kaggle-v2/27923/3495119/bundle/archive.zip?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1693533809&Signature=ThCLjIYxSXfk85lCbZ5Cz2Ta4g8AjwJv0%2FgRpqpchlZLLYxk3XRnrZqappboha0moC7FuqllpwlLfCambQMbKoUjCLylVQqF0mEsn0IaJdYwprWYY%2F4FJDT2lG0HdQfAxJxlUPonXeZyZ4pZjOrrVEMprxuiIcM2kpGk35h7ry5ajkmdQbYmNQHFAJK2iO%2F4a8%2F543zhZRWsZZVbQJHid%2BjfO6ilLWiAGnMFpx4Sh2B01TUde9hBCwpxgJv55Gs0a4Z1KNsBRly6uqwgZFYfUBAejySx4RxFB7KEuRowDYuoaRT8NhSkzT2i7qqdZjgHxkFZJpRMUlDcf1RSJVkvEA%3D%3D&response-content-disposition=attachment%3B+filename%3Duw-madison-gi-tract-image-segmentation.zip
+python tools/data/convert_multilabel.py \
+    --dataset_type uwmgi \
+    --zip_input ./uw-madison-gi-tract-image-segmentation.zip \
+    --output ./data/UWMGI/ \
+    --train_proportion 0.8 \
+    --val_proportion 0.2
+# optional
+rm ./uw-madison-gi-tract-image-segmentation.zip
+```
+
+The structure of the UWMGI dataset after conversion is as follows:
+```
+UWMGI
+    |
+    |--images
+    |  |--train
+    |  |  |--*.jpg
+    |  |  |--...
+    |  |
+    |  |--val
+    |  |  |--*.jpg
+    |  |  |--...
+    |
+    |--annotations
+    |  |--train
+    |  |  |--*.jpg
+    |  |  |--...
+    |  |
+    |  |--val
+    |  |  |--*.jpg
+    |  |  |--...
+    |
+    |--train.txt
+    |
+    |--val.txt
+```
+
+The divided training dataset and evaluation dataset can be configured as follows:
+```yaml
+train_dataset:
+  type: Dataset
+  dataset_root: data/UWMGI
+  transforms:
+    - type: Resize
+      target_size: [256, 256]
+    - type: RandomHorizontalFlip
+    - type: RandomVerticalFlip
+    - type: RandomDistort
+      brightness_range: 0.4
+      contrast_range: 0.4
+      saturation_range: 0.4
+    - type: Normalize
+      mean: [0.0, 0.0, 0.0]
+      std: [1.0, 1.0, 1.0]
+  num_classes: 3
+  train_path: data/UWMGI/train.txt
+  mode: train
+
+val_dataset:
+  type: Dataset
+  dataset_root: data/UWMGI
+  transforms:
+    - type: Resize
+      target_size: [256, 256]
+    - type: Normalize
+      mean: [0.0, 0.0, 0.0]
+      std: [1.0, 1.0, 1.0]
+  num_classes: 3
+  val_path: data/UWMGI/val.txt
+  mode: val
+```
+
+### 3.2 Training
+```shell
+python tools/train.py \
+    --config configs/multilabelseg/pp_mobileseg_tiny_uwmgi_256x256_160k.yml \
+    --save_dir output/pp_mobileseg_tiny_uwmgi_256x256_160k \
+    --num_workers 8 \
+    --do_eval \
+    --use_vdl \
+    --save_interval 2000 \
+    --use_multilabel
+```
++ *When using `--do_eval`must be added `--use_multilabel` parameter is used to adapt the evaluation in multi-label mode.*
+
+### 3.3 Evaluation
+```shell
+python tools/val.py \
+    --config configs/multilabelseg/pp_mobileseg_tiny_uwmgi_256x256_160k.yml \
+    --model_path output/pp_mobileseg_tiny_uwmgi_256x256_160k/best_model/model.pdparams \
+    --use_multilabel
+```
++ *Must add `--use_multilabel` when evaluating the model to adapt the evaluation in multi-label mode.*
+
+### 3.4 Inference
+```shell
+python tools/predict.py \
+    --config configs/multilabelseg/pp_mobileseg_tiny_uwmgi_256x256_160k.yml \
+    --model_path output/pp_mobileseg_tiny_uwmgi_256x256_160k/best_model/model.pdparams \
+    --image_path data/UWMGI/images/val/case122_day18_slice_0089.jpg \
+    --use_multilabel
+```
++ *When executing a prediction, it is necessary to add `--use_multilabel` parameter is used to adapt visualization in multi-label mode.*
diff --git a/configs/multilabelseg/README_cn.md b/configs/multilabelseg/README_cn.md
@@ -0,0 +1,139 @@
+[English](README.md) | 简体中文
+
+# 基于 PaddleSeg 的多标签语义分割
+
+## 1. 简介
+
+多标签语义分割是一种图像分割任务，它的目的是将图像中的每个像素分配到多个类别中，而不是只有一个类别。这样可以更好地表达图像中的复杂信息，例如不同物体的重叠、遮挡、边界等。多标签语义分割有许多应用场景，例如医学图像分析、遥感图像解译、自动驾驶等。
+
+<p align="center">
+<img src="https://github.com/PaddlePaddle/PaddleSeg/assets/95759947/ea6bb360-75de-4e06-9910-44c7d2fdbe6c">
+<img src="https://github.com/PaddlePaddle/PaddleSeg/assets/95759947/e2781865-db7e-4f46-98b2-3ef731e8bef1">
+<img src="https://github.com/PaddlePaddle/PaddleSeg/assets/95759947/9e587935-fd6f-459e-b798-0164eb98f44d">
+</p>
+
++ *以上效果展示图基于 [UWMGI](https://www.kaggle.com/competitions/uw-madison-gi-tract-image-segmentation/)数据集中的图片使用训练的模型所得到的推理结果。*
+
+## 2. 已支持的模型和损失函数
+
+|                                            Model                                            |           Loss           |
+|:-------------------------------------------------------------------------------------------:|:------------------------:|
+| DeepLabV3, DeepLabV3P, MobileSeg, <br/>PP-LiteSeg, PP-MobileSeg, UNet, <br/>Unet++, Unet+++ | BCELoss, LovaszHingeLoss |
+
++ *以上为确认支持的模型和损失函数，实际支持范围更大。*
+
+## 3. 示例教程
+
+如下将以 **[UWMGI](https://www.kaggle.com/competitions/uw-madison-gi-tract-image-segmentation/)** 多标签语义分割数据集和 **[PP-MobileSeg](../pp_mobileseg/README.md)** 模型为例。
+
+### 3.1 数据准备
+在单标签多类别语义分割任务中，标注灰度图的形状为 **(img_h, img_w)**, 并以灰度值来表示类别的索引值。
+
+在多标签语义分割任务中，标注灰度图的形状为 **(img_h, num_classes x img_w)**, 即将各个类别对应二值标注按顺序拼接在水平方向上。
+
+下载UWMGI数据集的原始数据压缩包，并使用提供的脚本转换为PaddleSeg的[Dataset](../../paddleseg/datasets/dataset.py) API支持的格式。
+```shell
+wget https://storage.googleapis.com/kaggle-competitions-data/kaggle-v2/27923/3495119/bundle/archive.zip?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1693533809&Signature=ThCLjIYxSXfk85lCbZ5Cz2Ta4g8AjwJv0%2FgRpqpchlZLLYxk3XRnrZqappboha0moC7FuqllpwlLfCambQMbKoUjCLylVQqF0mEsn0IaJdYwprWYY%2F4FJDT2lG0HdQfAxJxlUPonXeZyZ4pZjOrrVEMprxuiIcM2kpGk35h7ry5ajkmdQbYmNQHFAJK2iO%2F4a8%2F543zhZRWsZZVbQJHid%2BjfO6ilLWiAGnMFpx4Sh2B01TUde9hBCwpxgJv55Gs0a4Z1KNsBRly6uqwgZFYfUBAejySx4RxFB7KEuRowDYuoaRT8NhSkzT2i7qqdZjgHxkFZJpRMUlDcf1RSJVkvEA%3D%3D&response-content-disposition=attachment%3B+filename%3Duw-madison-gi-tract-image-segmentation.zip
+python tools/data/convert_multilabel.py \
+    --dataset_type uwmgi \
+    --zip_input ./uw-madison-gi-tract-image-segmentation.zip \
+    --output ./data/UWMGI/ \
+    --train_proportion 0.8 \
+    --val_proportion 0.2
+# 可选
+rm ./uw-madison-gi-tract-image-segmentation.zip
+```
+
+转换完成后的UWMGI数据集结构如下：
+```
+UWMGI
+    |
+    |--images
+    |  |--train
+    |  |  |--*.jpg
+    |  |  |--...
+    |  |
+    |  |--val
+    |  |  |--*.jpg
+    |  |  |--...
+    |
+    |--annotations
+    |  |--train
+    |  |  |--*.jpg
+    |  |  |--...
+    |  |
+    |  |--val
+    |  |  |--*.jpg
+    |  |  |--...
+    |
+    |--train.txt
+    |
+    |--val.txt
+```
+
+划分好的训练数据集和评估数据集可按如下方式进行配置：
+```yaml
+train_dataset:
+  type: Dataset
+  dataset_root: data/UWMGI
+  transforms:
+    - type: Resize
+      target_size: [256, 256]
+    - type: RandomHorizontalFlip
+    - type: RandomVerticalFlip
+    - type: RandomDistort
+      brightness_range: 0.4
+      contrast_range: 0.4
+      saturation_range: 0.4
+    - type: Normalize
+      mean: [0.0, 0.0, 0.0]
+      std: [1.0, 1.0, 1.0]
+  num_classes: 3
+  train_path: data/UWMGI/train.txt
+  mode: train
+
+val_dataset:
+  type: Dataset
+  dataset_root: data/UWMGI
+  transforms:
+    - type: Resize
+      target_size: [256, 256]
+    - type: Normalize
+      mean: [0.0, 0.0, 0.0]
+      std: [1.0, 1.0, 1.0]
+  num_classes: 3
+  val_path: data/UWMGI/val.txt
+  mode: val
+```
+
+### 3.2 训练模型
+```shell
+python tools/train.py \
+    --config configs/multilabelseg/pp_mobileseg_tiny_uwmgi_256x256_160k.yml \
+    --save_dir output/pp_mobileseg_tiny_uwmgi_256x256_160k \
+    --num_workers 8 \
+    --do_eval \
+    --use_vdl \
+    --save_interval 2000 \
+    --use_multilabel
+```
++ *当使用`--do_eval`必须添加`--use_multilabel`参数来适配多标签模式下的评估。*
+
+### 3.3 评估模型
+```shell
+python tools/val.py \
+    --config configs/multilabelseg/pp_mobileseg_tiny_uwmgi_256x256_160k.yml \
+    --model_path output/pp_mobileseg_tiny_uwmgi_256x256_160k/best_model/model.pdparams \
+    --use_multilabel
+```
++ *评估模型时必须添加`--use_multilabel`参数来适配多标签模式下的评估。*
+
+### 3.4 执行预测
+```shell
+python tools/predict.py \
+    --config configs/multilabelseg/pp_mobileseg_tiny_uwmgi_256x256_160k.yml \
+    --model_path output/pp_mobileseg_tiny_uwmgi_256x256_160k/best_model/model.pdparams \
+    --image_path data/UWMGI/images/val/case122_day18_slice_0089.jpg \
+    --use_multilabel
+```
++ *执行预测时必须添加`--use_multilabel`参数来适配多标签模式下的可视化。*
diff --git a/configs/multilabelseg/deeplabv3_resnet50_os8_uwmgi_256x256_160k.yml b/configs/multilabelseg/deeplabv3_resnet50_os8_uwmgi_256x256_160k.yml
@@ -0,0 +1,18 @@
+_base_: '../_base_/uwmgi.yml'
+
+batch_size: 8
+iters: 160000
+
+model:
+  type: DeepLabV3
+  num_classes: 3
+  backbone:
+    type: ResNet50_vd
+    output_stride: 8
+    multi_grid: [1, 2, 4]
+    pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
+  backbone_indices: [3]
+  aspp_ratios: [1, 12, 24, 36]
+  aspp_out_channels: 256
+  align_corners: False
+  pretrained: null