open-mmlab · MeowZheng · Oct 28, 2022 · Oct 18, 2022 · Oct 18, 2022 · Oct 18, 2022
diff --git a/docs/en/faq.md b/docs/en/faq.md
@@ -66,3 +66,105 @@ In the test script, we provide `show-dir` argument to control whether output the
 ```shell
 python tools/test.py {config} {checkpoint} --show-dir {/path/to/save/image} --opacity 1
 ```
+
+## How to handle binary segmentation task
+
+MMSegmentation uses `num_classes` and `out_channels` to control output of last layer `self.conv_seg` (More details could be found [here](https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/models/decode_heads/decode_head.py).):
+
+```python
+def __init__(self,
+             ...,
+             ):
+  ...
+  if out_channels is None:
+      if num_classes == 2:
+          warnings.warn('For binary segmentation, we suggest using'
+                        '`out_channels = 1` to define the output'
+                        'channels of segmentor, and use `threshold`'
+                        'to convert seg_logist into a prediction'
+                        'applying a threshold')
+      out_channels = num_classes
+
+  if out_channels != num_classes and out_channels != 1:
+      raise ValueError(
+          'out_channels should be equal to num_classes,'
+          'except binary segmentation set out_channels == 1 and'
+          f'num_classes == 2, but got out_channels={out_channels}'
+          f'and num_classes={num_classes}')
+
+  if out_channels == 1 and threshold is None:
+      threshold = 0.3
+      warnings.warn('threshold is not defined for binary, and defaults'
+                    'to 0.3')
+  self.num_classes = num_classes
+  self.out_channels = out_channels
+  self.threshold = threshold
+  ...
+  self.conv_seg = nn.Conv2d(channels, self.out_channels, kernel_size=1)
+```
+
+There are two types of calculating binary segmentation methods:
+
+```python
+...
+if self.out_channels == 1:
+    seg_logit = F.sigmoid(seg_logit)
+else:
+    seg_logit = F.softmax(seg_logit, dim=1)
+
+...
+
+if self.out_channels == 1:
+    seg_pred = (seg_logit >
+                self.decode_head.threshold).to(seg_logit).squeeze(1)
+else:
+    seg_pred = seg_logit.argmax(dim=1)
+```
+
+- When `out_channels=2`, using Cross Entropy Loss in training, using `F.softmax()` and `argmax()` to get prediction of each pixel in inference.
+
+- When `out_channels=1`, we provide a parameter `threshold(default to 0.3)` in [#2016](https://github.com/open-mmlab/mmsegmentation/pull/2016), using Binary Cross Entropy Loss in training, using `F.sigmoid()` and `threshold` to get prediction of each pixel in inference.
+
+More details about calculating segmentation prediction could be found in [encoder_decoder.py](https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/models/segmentors/encoder_decoder.py):
+
+In summary, to implement binary segmentation methods users should modify below parameters in the `decode_head` and `auxiliary_head` configs:
+
+- (1) `num_classes=2`, `out_channels=2`  and `use_sigmoid=False` in `CrossEntropyLoss`.
+
+- (2) `num_classes=2`, `out_channels=1` and `use_sigmoid=True` in `CrossEntropyLoss`.
+
+When taking solution (2), below is a modification example of [pspnet_unet_s5-d16.py](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/_base_/models/pspnet_unet_s5-d16.py):
+
+```python
+decode_head=dict(
+    type='PSPHead',
+    in_channels=64,
+    in_index=4,
+    num_classes=2,
+    out_channels=1,
+    loss_decode=dict(
+        type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)),
+auxiliary_head=dict(
+    type='FCNHead',
+    in_channels=128,
+    in_index=3,
+    num_classes=2,
+    out_channels=1,
+    loss_decode=dict(
+        type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.4)),
+```
+
+## What does `reduce_zero_label` work for?
+
+When [loading annotation](https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/datasets/pipelines/loading.py#L91) in MMSegmentation, `reduce_zero_label (bool)` is provided to determine whether reduce all label value by 1:
+
+```python
+if self.reduce_zero_label:
+    # avoid using underflow conversion
+    gt_semantic_seg[gt_semantic_seg == 0] = 255
+    gt_semantic_seg = gt_semantic_seg - 1
+    gt_semantic_seg[gt_semantic_seg == 254] = 255
+```
+
+`reduce_zero_label` is usually used for datasets where 0 is background label, if `reduce_zero_label=True`, the pixels whose corresponding label is 0 would not be involved in loss calculation.
+Noted that in binary segmentation task it is unnecessary to use `reduce_zero_label=True`, please take solutions we mentioned above.
diff --git a/docs/zh_cn/faq.md b/docs/zh_cn/faq.md
@@ -66,3 +66,104 @@
 ```shell
 python tools/test.py {config} {checkpoint} --show-dir {/path/to/save/image} --opacity 1
 ```
+
+## 如何处理二值分割任务?
+
+MMSegmentation 使用 `num_classes` 和 `out_channels` 来控制模型最后一层 `self.conv_seg` 的输出. (更多细节可以参考 [这里](https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/models/decode_heads/decode_head.py).):
+
+```python
+def __init__(self,
+             ...,
+             ):
+  ...
+  if out_channels is None:
+      if num_classes == 2:
+          warnings.warn('For binary segmentation, we suggest using'
+                        '`out_channels = 1` to define the output'
+                        'channels of segmentor, and use `threshold`'
+                        'to convert seg_logist into a prediction'
+                        'applying a threshold')
+      out_channels = num_classes
+
+  if out_channels != num_classes and out_channels != 1:
+      raise ValueError(
+          'out_channels should be equal to num_classes,'
+          'except binary segmentation set out_channels == 1 and'
+          f'num_classes == 2, but got out_channels={out_channels}'
+          f'and num_classes={num_classes}')
+
+  if out_channels == 1 and threshold is None:
+      threshold = 0.3
+      warnings.warn('threshold is not defined for binary, and defaults'
+                    'to 0.3')
+  self.num_classes = num_classes
+  self.out_channels = out_channels
+  self.threshold = threshold
+  ...
+  self.conv_seg = nn.Conv2d(channels, self.out_channels, kernel_size=1)
+```
+
+有两种计算二值分割任务的方法:
+
+- 当 `out_channels=2` 时, 在训练时以 Cross Entropy Loss 作为损失函数, 在推理时使用 `F.softmax()` 归一化 logits 值, 然后通过 `argmax()` 得到每个像素的预测结果.
+
+- 当 `out_channels=1` 时, 我们在 [#2016](https://github.com/open-mmlab/mmsegmentation/pull/2016) 里提供了阈值参数 `threshold (默认为 0.3)`, 在训练时以 Binary Cross Entropy Loss 作为损失函数, 在推理时使用 `F.sigmoid()` 和 `threshold` 得到预测结果.
+
+```python
+...
+if self.out_channels == 1:
+    seg_logit = F.sigmoid(seg_logit)
+else:
+    seg_logit = F.softmax(seg_logit, dim=1)
+
+...
+
+if self.out_channels == 1:
+    seg_pred = (seg_logit >
+                self.decode_head.threshold).to(seg_logit).squeeze(1)
+else:
+    seg_pred = seg_logit.argmax(dim=1)
+```
+
+更多关于计算语义分割预测的细节可以参考 [encoder_decoder.py](https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/models/segmentors/encoder_decoder.py):
+
+对于实现上述两种计算二值分割的方法, 需要在 `decode_head` 和 `auxiliary_head` 的配置里修改:
+
+- (1) `num_classes=2`, `out_channels=2` 并在 `CrossEntropyLoss` 里面设置 `use_sigmoid=False`
+
+- (2) `num_classes=2`, `out_channels=1` 并在 `CrossEntropyLoss` 里面设置 `use_sigmoid=True`.
+
+如果采用解决方案 (2), 下面是对样例 [pspnet_unet_s5-d16.py](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/_base_/models/pspnet_unet_s5-d16.py) 做出的对应修改:
+
+```python
+decode_head=dict(
+    type='PSPHead',
+    in_channels=64,
+    in_index=4,
+    num_classes=2,
+    out_channels=1,
+    loss_decode=dict(
+        type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)),
+auxiliary_head=dict(
+    type='FCNHead',
+    in_channels=128,
+    in_index=3,
+    num_classes=2,
+    out_channels=1,
+    loss_decode=dict(
+        type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.4)),
+```
+
+## `reduce_zero_label` 的作用
+
+在 MMSegmentation 里面, 当 [加载注释](https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/datasets/pipelines/loading.py#L91) 时, `reduce_zero_label (bool)` 被用来决定是否将所有 label 减去 1:
+
+```python
+if self.reduce_zero_label:
+    # avoid using underflow conversion
+    gt_semantic_seg[gt_semantic_seg == 0] = 255
+    gt_semantic_seg = gt_semantic_seg - 1
+    gt_semantic_seg[gt_semantic_seg == 254] = 255
+```
+
+`reduce_zero_label` 常常被用来处理 label 0 是背景的数据集, 如果 `reduce_zero_label=True`, label 0 对应的像素将不会参与损失函数的计算. 需要说明的是在二值分割任务中没有必要设置 `reduce_zero_label=True`, 请采用上面我们提到的解决方案.
diff --git a/mmseg/models/decode_heads/decode_head.py b/mmseg/models/decode_heads/decode_head.py
@@ -21,7 +21,7 @@ class BaseDecodeHead(BaseModule, metaclass=ABCMeta):
         num_classes (int): Number of classes.
         out_channels (int): Output channels of conv_seg.
         threshold (float): Threshold for binary segmentation in the case of
-            `num_classes==1`. Default: None.
+            `out_channels==1`. Default: None.
         dropout_ratio (float): Ratio of dropout layer. Default: 0.1.
         conv_cfg (dict|None): Config of conv layers. Default: None.
         norm_cfg (dict|None): Config of norm layers. Default: None.