[Feature] Support multiple losses during training #818

MengzhangLI · 2021-08-24T12:20:23Z

Because of being asked frequently by communities, I directly cloned related PR of multiple losses implementation and make a new PR.

Related PR: #244

Related Issues: #779, #727, #486 and so on.

Here is my results on UNet with backbone UNet-S5-D16 and model FCN:

Note:
(1) CE means its loss function is cross entropy and DC means dice loss.
(2) CE is cross entropy loss, which is default loss function of MMSegmentation config, I reproduce training to testify real difference between different loss settings.
(3) loss_weight is also important. For instance, (0.5 : 1) below means the weight of cross entropy loss CE and dice loss DC is 0.5 and 1, respectively.
(4) I use --seed 0 but still have some variances from same config setting of different training experiments.

Datasets	CE (from repo config)	CE (on my own)	CE + DC (1:1)	CE + DC (0.5:1)	CE + DC (1:0.5)	CE + DC (1:3)
DRIVE	78.67	78.42	79.18	78.94	79.58	79.51
STARE	81.02	81.29	82.08	81.44	82.01	82.39
CHASE_DB1	80.24	80.25	80.53	80.58	80.46	80.26
HRF	79.45	79.24	80.66	80.72	80.6	80.79

codecov · 2021-08-24T12:34:28Z

Codecov Report

Merging #818 (01bff41) into master (e235c1a) will increase coverage by 1.44%.
The diff coverage is 96.05%.

❗ Current head 01bff41 differs from pull request most recent head 0b3a22b. Consider uploading reports for the commit 0b3a22b to get more accurate results

@@            Coverage Diff             @@
##           master     #818      +/-   ##
==========================================
+ Coverage   87.64%   89.09%   +1.44%     
==========================================
  Files         108      112       +4     
  Lines        5886     6081     +195     
  Branches      958      977      +19     
==========================================
+ Hits         5159     5418     +259     
+ Misses        535      468      -67     
- Partials      192      195       +3

Flag	Coverage Δ
unittests	`89.09% <96.05%> (+1.46%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmseg/core/evaluation/metrics.py	`90.42% <ø> (-0.20%)`	⬇️
mmseg/datasets/pipelines/formating.py	`63.82% <ø> (ø)`
mmseg/models/backbones/cgnet.py	`94.63% <ø> (ø)`
mmseg/models/backbones/fast_scnn.py	`97.08% <ø> (ø)`
mmseg/models/backbones/mit.py	`91.53% <ø> (ø)`
mmseg/models/backbones/mobilenet_v2.py	`71.08% <ø> (ø)`
mmseg/models/backbones/resnest.py	`83.72% <ø> (ø)`
mmseg/models/backbones/resnet.py	`99.28% <ø> (ø)`
mmseg/models/backbones/resnext.py	`100.00% <ø> (ø)`
mmseg/models/backbones/unet.py	`94.91% <ø> (ø)`
... and 37 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e235c1a...0b3a22b. Read the comment docs.

docs_zh-CN/tutorials/training_tricks.md

docs/tutorials/training_tricks.md

Junjun2016 · 2021-08-26T18:33:49Z

Should add more unittests to improve the coverage.

docs/tutorials/training_tricks.md

docs_zh-CN/tutorials/training_tricks.md

mmseg/models/decode_heads/decode_head.py

caodroid · 2021-09-04T09:13:02Z

HI， Supporting multiple losses during training is very in need and a nice work，here I also have a suggestion for this work , here
the multiple losses is only for the final prediction, however, if the decoder_head has more than 2 prediction through deep supervisions with differen loss weight, like da_head.py or the network in figure,

this may not work. for this problem, I think the following code may works

@force_fp32(apply_to=('seg_logit', ))
def _losses(self, seg_logit, seg_label, loss_decode):
    """Compute segmentation loss."""
    loss = dict()
    seg_logit = resize(
        input=seg_logit,
        size=seg_label.shape[2:],
        mode='bilinear',
        align_corners=self.align_corners)
    if self.sampler is not None:
        seg_weight = self.sampler.sample(seg_logit, seg_label)
    else:
        seg_weight = None
    seg_label = seg_label.squeeze(1)
  #for loss_name, loss_decode in zip(self.loss_names, self.loss_decode):
    loss['loss_seg'] = loss_decode(
        seg_logit,
        seg_label,
        weight=seg_weight,
        ignore_index=self.ignore_index)
    loss['acc_seg'] = accuracy(seg_logit, seg_label)
    return loss
  
 def losses(self, seg_logit, seg_label):
           """Compute segmentation loss."""
    loss = dict()
    if isinstance(seg_logit, torch.Tensor):   # only one seg_logit 
        for loss_name, loss_decode in zip(self.loss_names, self.loss_decode):
              loss.update(add_prefix( self._losses(seg_logit, seg_label, loss_decode ), loss_name))
   elif isinstance(seg_logit, （list, tuple):   # multiple seg_logits
        for logit, loss_name, loss_decode in zip(seg_logit, self.loss_names, self.loss_decode):
               loss.update(add_prefix( self._losses(logit, seg_label, loss_decode ), loss_name))
    return loss`

openmmlab-bot · 2021-09-15T01:56:18Z

Task linked: CU-k5tuzw mix loss

Junjun2016 · 2021-09-19T09:09:32Z

Should add more unitests to improve the code coverage.

docs/tutorials/training_tricks.md

mmseg/models/losses/cross_entropy_loss.py

mmseg/models/losses/dice_loss.py

mmseg/models/losses/lovasz_loss.py

docs_zh-CN/tutorials/training_tricks.md

tests/test_models/test_heads/test_decode_head.py

mmseg/models/decode_heads/decode_head.py

MengzhangLI · 2021-09-22T09:59:50Z

HI， Supporting multiple losses during training is very in need and a nice work，here I also have a suggestion for this work , here
the multiple losses is only for the final prediction, however, if the decoder_head has more than 2 prediction through deep supervisions with differen loss weight, like da_head.py or the network in figure,

this may not work. for this problem, I think the following code may works

@force_fp32(apply_to=('seg_logit', ))
def _losses(self, seg_logit, seg_label, loss_decode):
    """Compute segmentation loss."""
    loss = dict()
    seg_logit = resize(
        input=seg_logit,
        size=seg_label.shape[2:],
        mode='bilinear',
        align_corners=self.align_corners)
    if self.sampler is not None:
        seg_weight = self.sampler.sample(seg_logit, seg_label)
    else:
        seg_weight = None
    seg_label = seg_label.squeeze(1)
  #for loss_name, loss_decode in zip(self.loss_names, self.loss_decode):
    loss['loss_seg'] = loss_decode(
        seg_logit,
        seg_label,
        weight=seg_weight,
        ignore_index=self.ignore_index)
    loss['acc_seg'] = accuracy(seg_logit, seg_label)
    return loss
  
 def losses(self, seg_logit, seg_label):
           """Compute segmentation loss."""
    loss = dict()
    if isinstance(seg_logit, torch.Tensor):   # only one seg_logit 
        for loss_name, loss_decode in zip(self.loss_names, self.loss_decode):
              loss.update(add_prefix( self._losses(seg_logit, seg_label, loss_decode ), loss_name))
   elif isinstance(seg_logit, torch.Tensor):   # multiple seg_logits
        for logit, loss_name, loss_decode in zip(seg_logit, self.loss_names, self.loss_decode):
               loss.update(add_prefix( self._losses(logit, seg_label, loss_decode ), loss_name))
    return loss`

Thanks for your advice. Deep supervision can be implemented by auxiliary head.

caodroid · 2021-09-23T01:28:27Z

HI， Supporting multiple losses during training is very in need and a nice work，here I also have a suggestion for this work , here
the multiple losses is only for the final prediction, however, if the decoder_head has more than 2 prediction through deep supervisions with differen loss weight, like da_head.py or the network in figure,

this may not work. for this problem, I think the following code may works

@force_fp32(apply_to=('seg_logit', ))
def _losses(self, seg_logit, seg_label, loss_decode):
    """Compute segmentation loss."""
    loss = dict()
    seg_logit = resize(
        input=seg_logit,
        size=seg_label.shape[2:],
        mode='bilinear',
        align_corners=self.align_corners)
    if self.sampler is not None:
        seg_weight = self.sampler.sample(seg_logit, seg_label)
    else:
        seg_weight = None
    seg_label = seg_label.squeeze(1)
  #for loss_name, loss_decode in zip(self.loss_names, self.loss_decode):
    loss['loss_seg'] = loss_decode(
        seg_logit,
        seg_label,
        weight=seg_weight,
        ignore_index=self.ignore_index)
    loss['acc_seg'] = accuracy(seg_logit, seg_label)
    return loss
  
 def losses(self, seg_logit, seg_label):
           """Compute segmentation loss."""
    loss = dict()
    if isinstance(seg_logit, torch.Tensor):   # only one seg_logit 
        for loss_name, loss_decode in zip(self.loss_names, self.loss_decode):
              loss.update(add_prefix( self._losses(seg_logit, seg_label, loss_decode ), loss_name))
   elif isinstance(seg_logit, torch.Tensor):   # multiple seg_logits
        for logit, loss_name, loss_decode in zip(seg_logit, self.loss_names, self.loss_decode):
               loss.update(add_prefix( self._losses(logit, seg_label, loss_decode ), loss_name))
    return loss`

Thanks for your advice. Deep supervision can be implemented by auxiliary head.

Thanks for your reply, it's thruth that the deep supervision can be implemented by auxiliary head, but the auxiliary head can only supervise the bachbone. in my advice, I suggest to implemente deep supervision on multiple preditions, for example: there are three output preditions in da_head, however if I want to change the loss weight on the other auxiliary output predition, it is not achievable in current version. this demand is very useful for multiple preditions.

` def forward(self, inputs):

    x = self._transform_inputs(inputs)
    pam_feat = self.pam_in_conv(x)
    pam_feat = self.pam(pam_feat)
    pam_feat = self.pam_out_conv(pam_feat)
    pam_out = self.pam_cls_seg(pam_feat)

    cam_feat = self.cam_in_conv(x)
    cam_feat = self.cam(cam_feat)
    cam_feat = self.cam_out_conv(cam_feat)
    cam_out = self.cam_cls_seg(cam_feat)

    feat_sum = pam_feat + cam_feat
    pam_cam_out = self.cls_seg(feat_sum)

    return pam_cam_out, pam_out, cam_out`

MengzhangLI · 2021-09-23T03:14:20Z

HI， Supporting multiple losses during training is very in need and a nice work，here I also have a suggestion for this work , here
the multiple losses is only for the final prediction, however, if the decoder_head has more than 2 prediction through deep supervisions with differen loss weight, like da_head.py or the network in figure,

this may not work. for this problem, I think the following code may works
@force_fp32(apply_to=('seg_logit', ))
def _losses(self, seg_logit, seg_label, loss_decode):
    """Compute segmentation loss."""
    loss = dict()
    seg_logit = resize(
        input=seg_logit,
        size=seg_label.shape[2:],
        mode='bilinear',
        align_corners=self.align_corners)
    if self.sampler is not None:
        seg_weight = self.sampler.sample(seg_logit, seg_label)
    else:
        seg_weight = None
    seg_label = seg_label.squeeze(1)
  #for loss_name, loss_decode in zip(self.loss_names, self.loss_decode):
    loss['loss_seg'] = loss_decode(
        seg_logit,
        seg_label,
        weight=seg_weight,
        ignore_index=self.ignore_index)
    loss['acc_seg'] = accuracy(seg_logit, seg_label)
    return loss
  
 def losses(self, seg_logit, seg_label):
           """Compute segmentation loss."""
    loss = dict()
    if isinstance(seg_logit, torch.Tensor):   # only one seg_logit 
        for loss_name, loss_decode in zip(self.loss_names, self.loss_decode):
              loss.update(add_prefix( self._losses(seg_logit, seg_label, loss_decode ), loss_name))
   elif isinstance(seg_logit, torch.Tensor):   # multiple seg_logits
        for logit, loss_name, loss_decode in zip(seg_logit, self.loss_names, self.loss_decode):
               loss.update(add_prefix( self._losses(logit, seg_label, loss_decode ), loss_name))
    return loss`
Thanks for your advice. Deep supervision can be implemented by auxiliary head.
Thanks for your reply, it's thruth that the deep supervision can be implemented by auxiliary head, but the auxiliary head can only supervise the bachbone. in my advice, I suggest to implemente deep supervision on multiple preditions, for example: there are three output preditions in da_head, however if I want to change the loss weight on the other auxiliary output predition, it is not achievable in current version. this demand is very useful for multiple preditions.

` def forward(self, inputs):
    x = self._transform_inputs(inputs)
    pam_feat = self.pam_in_conv(x)
    pam_feat = self.pam(pam_feat)
    pam_feat = self.pam_out_conv(pam_feat)
    pam_out = self.pam_cls_seg(pam_feat)

    cam_feat = self.cam_in_conv(x)
    cam_feat = self.cam(cam_feat)
    cam_feat = self.cam_out_conv(cam_feat)
    cam_out = self.cam_cls_seg(cam_feat)

    feat_sum = pam_feat + cam_feat
    pam_cam_out = self.cls_seg(feat_sum)

    return pam_cam_out, pam_out, cam_out`

Hi, thanks for your nice proposal and I will think about it in the future, i.e., to make this implementation more flexible just as you mentioned above.

Right now there are already two ways to support it. (1) Like UNet we implemented, the total encoder-decoder is the backbone except final fully connected network. (2) Use NECK between BACKBONE and HEAD to handle those feature maps.

Thanks again for your warmhearted proposal.

Best,

xvjiarui

LGTM except for the missing unittest

mmseg/models/losses/dice_loss.py

mmseg/models/decode_heads/decode_head.py

mmseg/models/losses/lovasz_loss.py

Junjun2016 · 2021-09-24T05:26:06Z

Please fix the lint error.

mmseg/models/decode_heads/decode_head.py

* multiple losses * fix lint error * fix typos * fix typos * Adding Attribute * Fixing loss_ prefix * Fixing loss_ prefix * Fixing loss_ prefix * Add Same * loss_name must has 'loss_' prefix * Fix unittest * Fix unittest * Fix unittest * Update mmseg/models/decode_heads/decode_head.py Co-authored-by: Junjun2016 <hejunjun@sjtu.edu.cn>

MengzhangLI added 2 commits August 24, 2021 19:48

multiple losses

876d408

fix lint error

849137d

MengzhangLI requested a review from xvjiarui August 25, 2021 03:13

This was referenced Aug 25, 2021

How to combine multiple loss functions? #779

Closed

请问loss的种类一共支持哪些，是否支持不同种类loss的混合搭配 #727

Closed

How to use multiple losses #486

Closed

MengzhangLI added the enhancement New feature or request label Aug 26, 2021

Junjun2016 reviewed Aug 26, 2021

View reviewed changes

docs_zh-CN/tutorials/training_tricks.md Outdated Show resolved Hide resolved

Junjun2016 reviewed Aug 26, 2021

View reviewed changes

docs/tutorials/training_tricks.md Outdated Show resolved Hide resolved

Junjun2016 reviewed Aug 30, 2021

View reviewed changes

docs/tutorials/training_tricks.md Outdated Show resolved Hide resolved

Junjun2016 reviewed Aug 30, 2021

View reviewed changes

docs/tutorials/training_tricks.md Outdated Show resolved Hide resolved

Junjun2016 reviewed Aug 30, 2021

View reviewed changes

docs_zh-CN/tutorials/training_tricks.md Outdated Show resolved Hide resolved

Junjun2016 reviewed Aug 30, 2021

View reviewed changes

docs_zh-CN/tutorials/training_tricks.md Outdated Show resolved Hide resolved

Junjun2016 reviewed Aug 30, 2021

View reviewed changes

fix typos

73980c4

MengzhangLI force-pushed the multi-loss branch from c459c0b to 73980c4 Compare August 31, 2021 06:25

MengzhangLI added 2 commits August 31, 2021 15:12

fix typos

84e7371

fix version conflict

2b1ece5

xvjiarui reviewed Sep 2, 2021

View reviewed changes

mmseg/models/decode_heads/decode_head.py Outdated Show resolved Hide resolved

mmseg/models/decode_heads/decode_head.py Outdated Show resolved Hide resolved

mmseg/models/decode_heads/decode_head.py Outdated Show resolved Hide resolved

MengzhangLI added the WIP Work in process label Sep 10, 2021

Adding Attribute

a30749a