Add DeepLabV3+ Support #2689

alihassanijr · 2020-09-19T05:00:53Z

I added DeepLabV3+ (ResNet backbone) to the segmentation models, since the "decoder" only needed a few changes, like using low-level features from layer1 in resnet. I've tested it on both ResNet50 and ResNet101 backbones, and it seems to work, but I haven't had the chance to train it fully and verify it can reproduce the results from the paper yet.
Any thoughts and comments on the changes are greatly appreciated.

oke-aditya · 2020-09-19T05:02:52Z

Lint failed here

./torchvision/models/segmentation/_utils.py:49:36: E128 continuation line under-indented for visual indent
./torchvision/models/segmentation/_utils.py:50:36: E128 continuation line under-indented for visual indent
./torchvision/models/segmentation/_utils.py:51:36: E128 continuation line under-indented for visual indent
./torchvision/models/segmentation/_utils.py:52:36: E128 continuation line under-indented for visual indent
./torchvision/models/segmentation/_utils.py:53:36: E128 continuation line under-indented for visual indent
./torchvision/models/segmentation/_utils.py:54:36: E128 continuation line under-indented for visual indent
./torchvision/models/segmentation/_utils.py:55:36: E128 continuation line under-indented for visual indent
./torchvision/models/segmentation/_utils.py:56:36: E128 continuation line under-indented for visual indent
./torchvision/models/segmentation/_utils.py:62:1: W293 blank line contains whitespace
./torchvision/models/segmentation/_utils.py:72:1: W391 blank line at end of file
./torchvision/models/segmentation/_utils.py:72:1: W293 blank line contains whitespace
./torchvision/models/segmentation/segmentation.py:8:121: E501 line too long (141 > 120 characters)
./torchvision/models/segmentation/deeplabv3.py:12:1: E303 too many blank lines (3)

This has become required CI check I believe.

It's ok, most PRs always fail linting. Don't worry.

alihassanijr · 2020-09-19T05:15:05Z

My apologies, I've reformatted the files based on the log.

codecov · 2020-09-19T06:45:47Z

Codecov Report

Merging #2689 (cc0f598) into master (78159d6) will increase coverage by 0.01%.
The diff coverage is 98.24%.

@@            Coverage Diff             @@
##           master    #2689      +/-   ##
==========================================
+ Coverage   73.39%   73.40%   +0.01%     
==========================================
  Files          99       99              
  Lines        8825     8830       +5     
  Branches     1391     1392       +1     
==========================================
+ Hits         6477     6482       +5     
+ Misses       1929     1920       -9     
- Partials      419      428       +9

Impacted Files	Coverage Δ
torchvision/models/segmentation/deeplabv3.py	`98.80% <97.82%> (-1.20%)`	⬇️
torchvision/models/segmentation/_utils.py	`80.76% <100.00%> (+0.76%)`	⬆️
torchvision/models/segmentation/segmentation.py	`71.42% <100.00%> (+3.98%)`	⬆️
torchvision/ops/feature_pyramid_network.py	`91.20% <0.00%> (-3.30%)`	⬇️
torchvision/models/detection/retinanet.py	`72.98% <0.00%> (-2.90%)`	⬇️
torchvision/models/detection/anchor_utils.py	`92.10% <0.00%> (-2.70%)`	⬇️
torchvision/transforms/functional_pil.py	`66.19% <0.00%> (-1.88%)`	⬇️
torchvision/ops/deform_conv.py	`70.96% <0.00%> (-1.34%)`	⬇️
torchvision/models/detection/backbone_utils.py	`94.28% <0.00%> (-1.27%)`	⬇️
torchvision/ops/poolers.py	`97.05% <0.00%> (-1.02%)`	⬇️
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78159d6...a1b40a7. Read the comment docs.

alihassanijr · 2020-09-19T07:40:35Z

I apologize for the few mistakes I found at the last minute. Everything seems to be stable now.

vfdev-5 · 2020-09-21T09:03:13Z

@ali-nsua thanks for the PR !
Adding a new model to torchvision can be a bit difficult due to several points:

usefulness of the model (popularity)
pretrained weights
implementation details

Concerning DeeplabV3+, I may be wrong but the backbone of the model is a modified Xception according to the paper. Probably, we wont get paper's performances with ResNet.
Maybe, we can start a discussion about introducing Xception to torchvision and then building a DeepLabV3+.
What do you think @fmassa ?

alihassanijr · 2020-09-21T09:09:35Z

Hi @vfdev-5
Thank you for your response. You are right, one of the main points of the paper is using Xception as the encoder backbone. However, they did mention that a ResNet101 could also be used.
I could try implementing Xception as well and adding it as an option for the backbone.
Thanks again, and I look forward to your and everyone else's feedback.

fmassa · 2020-09-22T13:08:45Z

Hi,

Thanks for the PR!

As @vfdev-5 mentioned, having pre-trained weights for the model (that reproduces within a reasonable tolerance reported results) is a must before we can add it to torchvision.

From a quick look at the paper for ResNet101 (Table 3), I had the impression that the new decoder improved results by ~1.5 points, and that most of the reported mIoU improvements come from using Xception + test-time augmentation.

I would be willing to consider adding DeepLabV3+ with ResNet101 backbone if we manage to retrain it and match results (which might be a fairly involved task), ie., get around 1.5 mIoU improvement on top of DeepLabV3 ResNet101 from torchvision.

About adding Xception to torchvision, that's a separate discussion and I would prefer to have it in a different issue. IIRC it wasn't very good at transfer learning (or on other tasks like detection), but I might be confusing it with something else (as it seemed to be successfully used here)

alihassanijr · 2020-09-22T13:18:16Z

Hi,

Thanks for the PR!

As @vfdev-5 mentioned, having pre-trained weights for the model (that reproduces within a reasonable tolerance reported results) is a must before we can add it to torchvision.

From a quick look at the paper for ResNet101 (Table 3), I had the impression that the new decoder improved results by ~1.5 points, and that most of the reported mIoU improvements come from using Xception + test-time augmentation.

I would be willing to consider adding DeepLabV3+ with ResNet101 backbone if we manage to retrain it and match results (which might be a fairly involved task), ie., get around 1.5 mIoU improvement on top of DeepLabV3 ResNet101 from torchvision.

About adding Xception to torchvision, that's a separate discussion and I would prefer to have it in a different issue. IIRC it wasn't very good at transfer learning (or on other tasks like detection), but I might be confusing it with something else (as it seemed to be successfully used here)

Hi,
Thank you so much for your input. I'll try to retrain it using ResNet101 and see how it'll do. I'll add updates here.

fmassa · 2020-09-22T13:26:03Z

@ali-nsua when training the model, please try using the reference training scripts in https://github.com/pytorch/vision/tree/master/references/segmentation so that we have a single entry-point for training everything

alihassanijr · 2020-09-22T13:41:29Z

@ali-nsua when training the model, please try using the reference training scripts in https://github.com/pytorch/vision/tree/master/references/segmentation so that we have a single entry-point for training everything

Sure thing.

alihassanijr · 2020-09-23T07:35:16Z

@fmassa
I noticed something when I was running the reference training script. Apparently the ToTensor transform was raising a warning about a numpy array not being writable. I looked around and fixed it by switching from np.asarray to np.array on my clone. I could push it to this PR as well if you think it would help, or maybe I'm doing something wrong.

Just a thought: It could be helpful adding an argument to the training script to allow the segmentation datasets to be downloaded if they're not already cached locally, that is if they are publicly available.

Just let me know if any of these are okay, and I can create a separate PR for those.

alihassanijr · 2020-09-27T20:39:32Z

@fmassa, @vfdev-5, @oke-aditya
Running the model on VOC actually helped me a great deal in debugging my implementation. However, with only a limited underpowered GPU, I was able to train it twice with about 75.0 mIoU on the VOC val set. Of course, I had to modify the reference scripts in two key areas: 1- I had to set the classifier learning rate to 10 times the original lr, just like the aux classifier, and 2- I had to change the crop size from 480 to 513, as reported in the paper.
I'll try tuning it again and will hopefully be back with better results, closer to the ones reported in the paper next week.

P.S. I also managed to fix the output strides to 16 and 8 for training and evaluation respectively, since they proved to perform best in the paper.

I would be very grateful for any comments or suggestions.

-- Slight problem: the pretrained torchvision models are all based on COCO, while the paper reports training on VOC. I guess I should try COCO as well, then evaluate on VOC?

fmassa · 2020-09-30T09:34:21Z

@ali-nsua thanks for all the work here!

However, with only a limited underpowered GPU

That's the tricky part about contributing models, as they generally require a lot of computing power. Over the next 6 months or so we will have more bandwidth to contribute more models ourselves as well.

Of course, I had to modify the reference scripts in two key areas

The modifications of the reference scripts would need to be submitted as well somehow, so that we can ensure reproducibility, but let's leave this for another discussion.

I was able to train it twice with about 75.0 mIoU on the VOC val set.

Nice! From the paper it seems to be still ~3 points behind the reported numbers, but it's getting there!

Slight problem: the pretrained torchvision models are all based on COCO, while the paper reports training on VOC. I guess I should try COCO as well, then evaluate on VOC?

The pretrained models in torchvision are indeed based on COCO, and for consistency it would be good if we could keep the same things here. Also, many of the segmentation papers actually pre-train on COCO (and then finetune on VOC, see section 4.3 in the paper), so providing the models pre-trained on COCO actually has a lot of value as well.

alihassanijr · 2020-09-30T09:45:25Z

@fmassa Thank you for your comments. I couldn't agree more that pretraining on COCO would be the way to go, but I unfortunately do not have access to COCO, and I believe it is not publicly available. I will definitely find a way to train it once I find a copy.
About the paper, I could be wrong, but I think they pretrained the model on COCO with the Xception backbone. With the ResNet101 backbone, they haven't mentioned anything other than the backbone being pretrained on ImageNet. However, I found that the COCO-pretrained DLV3 which is already available does very well on the VOC val set without any specific fine-tuning. One can just run the reference script with --test-only --pretrained --dataset voc to check that.

To sum up, I'll try finding a copy of COCO and training DLV3+ on that, see where we can get.

Thanks again for your time.

oke-aditya · 2020-09-30T09:48:19Z

Great work @ali-nsua
Here is the COCO dataset 👍

giangnguyen2412 · 2020-11-04T03:12:37Z

Hi, I wonder can I only get the segmentation for top-1 prediction in an image? At this time, it is displaying the segmentations for all classes present in the image.

alihassanijr · 2020-11-10T11:00:24Z

Hi, I wonder can I only get the segmentation for top-1 prediction in an image? At this time, it is displaying the segmentations for all classes present in the image.

Hi,
Sorry for the delay, I was unavailable for a few days.
Sure, just try this:

with torch.no_grad():
    output = model(x)['out']
output_predictions = output.argmax(1)  # output.shape = (batch_size, n_classes)

alihassanijr · 2020-11-10T11:01:33Z

Great work @ali-nsua
Here is the COCO dataset 👍

Hi,
I'm terribly sorry but due to limited compute resources, I was only able to train the network once and unfortunately it did not quite reach the expected results. Is there any time limit on PRs?

oke-aditya · 2020-11-11T19:38:12Z

I don't think so. E.g. #1697 was last model added which took time as well.

P.S. You might need to sign CLA please have a look.

I guess DeepLabV3+ was added to Detectron2 (0.3 release) recently. You can have look there as well 👍

alihassanijr · 2020-11-11T19:52:19Z

I don't think so. E.g. #1697 was last model added which took time as well.

P.S. You might need to sign CLA please have a look.

I guess this was added to Detectron2 (0.3 release) recently. You can have look there as well 👍

Thanks!
I just signed CLA.

I didn't understand that last part unfortunately. Could you please elaborate?

facebook-github-bot · 2020-11-11T19:52:42Z

Hi @ali-nsua!

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

oke-aditya · 2020-11-11T19:56:49Z

Here is DeepLabV3+ from detectron2.

Maybe that might help you in some way.

voldemortX · 2021-02-13T02:17:28Z

@ali-nsua Just happened to find this PR, some thoughts from my own experience:
Training on VOC is fast and single card trainable so its very suitable for getting PoCs. But if you want to achieve a ~1.5 improvement upon DeepLabV3 looking at an ablation study table, then probably training the already implemented DeepLabV3 from torchvision on VOC is needed, since different implementations (augmentation, size, testing scheme, learning rate schedule, training epochs, regularizations, OS, batch size) make very different performance on mean IoU, especially the PASCAL VOC dataset. And the ablation studies usually are not detailed and maybe imperfect in implementations.

To sum up, It is very hard to precisely reproduce the absolute results in this field, especially from an ablation study. Relative improvements are much more sensible. So I guess good choice for ~1.5 relative improvement here:

I would be willing to consider adding DeepLabV3+ with ResNet101 backbone if we manage to retrain it and match results (which might be a fairly involved task), ie., get around 1.5 mIoU improvement on top of DeepLabV3 ResNet101 from torchvision.

When I train the DeepLabV3 from torchvision on VOC with no testing tricks at 321x321, I can already get 78.11 avg performance here, and 78.7 mIoU is also reported with mmsegmentation for 512x512 inputs. So for instance with these scripts the Table 3 results in the paper is not referenceable, and you probably need a DeepLabV3 performance from your own training script first then attain a ~1.5 improv. upon it. Alternatively, since your script is similar to the reference code, you can refer to a logging of VOC performance from prior implementations here in torchvision, but I can't seem to find results other than COCO yet?

Some additional heads up, it might be impossible to get that ~1.5 with ResNet-101 if you also look at here. It seems the V3+ decoder do not work well on ResNet as expected.

alihassanijr · 2021-02-13T09:14:39Z

@voldemortX Thank you for your comments. Unfortunately, I got caught up with so much work, and I didn't have any reliable resources available, I was just renting off of cloud.

Based on your comments I guess I could try this again with ResNet-50, and try to find a set of hyperparams, augs, and the like that would potentially work.

voldemortX · 2021-02-13T09:29:26Z

@voldemortX Thank you for your comments. Unfortunately, I got caught up with so much work, and I didn't have any reliable resources available, I was just renting off of cloud.

Based on your comments I guess I could try this again with ResNet-50, and try to find a set of hyperparams, augs, and the like that would potentially work.

Totally understand the bandwidth problem. Good work with this PR! If I come by some V3+ results someday that can help you, I'll just post them here.

facebook-github-bot · 2021-08-23T04:18:15Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

facebook-github-bot · 2021-08-23T06:14:28Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

oke-aditya mentioned this pull request Sep 25, 2020

Are new models planned to be added? #2707

Open

37 tasks

facebook-github-bot added the cla signed label Nov 11, 2020

voldemortX mentioned this pull request Mar 24, 2021

[Kept for Feedback] Multi-GPU & New models voldemortX/DST-CBC#1

Open

alihassanijr closed this Oct 26, 2022

Add DeepLabV3+ Support #2689

Add DeepLabV3+ Support #2689

Uh oh!

Conversation

alihassanijr commented Sep 19, 2020

Uh oh!

oke-aditya commented Sep 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alihassanijr commented Sep 19, 2020

Uh oh!

codecov bot commented Sep 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

alihassanijr commented Sep 19, 2020

Uh oh!

vfdev-5 commented Sep 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alihassanijr commented Sep 21, 2020

Uh oh!

fmassa commented Sep 22, 2020

Uh oh!

alihassanijr commented Sep 22, 2020

Uh oh!

fmassa commented Sep 22, 2020

Uh oh!

alihassanijr commented Sep 22, 2020

Uh oh!

alihassanijr commented Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alihassanijr commented Sep 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmassa commented Sep 30, 2020

Uh oh!

alihassanijr commented Sep 30, 2020

Uh oh!

oke-aditya commented Sep 30, 2020

Uh oh!

giangnguyen2412 commented Nov 4, 2020

Uh oh!

alihassanijr commented Nov 10, 2020

Uh oh!

alihassanijr commented Nov 10, 2020

Uh oh!

oke-aditya commented Nov 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alihassanijr commented Nov 11, 2020

Uh oh!

facebook-github-bot commented Nov 11, 2020

Uh oh!

oke-aditya commented Nov 11, 2020

Uh oh!

voldemortX commented Feb 13, 2021

Uh oh!

alihassanijr commented Feb 13, 2021

Uh oh!

voldemortX commented Feb 13, 2021

Uh oh!

facebook-github-bot commented Aug 23, 2021

Uh oh!

facebook-github-bot commented Aug 23, 2021

Uh oh!

Uh oh!

oke-aditya commented Sep 19, 2020 •

edited

Loading

codecov bot commented Sep 19, 2020 •

edited

Loading

vfdev-5 commented Sep 21, 2020 •

edited

Loading

alihassanijr commented Sep 23, 2020 •

edited

Loading

alihassanijr commented Sep 27, 2020 •

edited

Loading

oke-aditya commented Nov 11, 2020 •

edited

Loading