-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MobileNetV3 architecture for Segmentation #3276
Conversation
8061535
to
462d59a
Compare
231a525
to
359d941
Compare
610f13f
to
406fa47
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks!
I only have a couple of minor (non-blocking) comments. The only thing I would really like to see fixed before merge is to have the correct python -m torch.distributed.launch ...
commands for reproducibility.
@@ -82,7 +84,7 @@ def __init__(self, cnf: InvertedResidualConfig, norm_layer: Callable[..., nn.Mod | |||
|
|||
self.block = nn.Sequential(*layers) | |||
self.out_channels = cnf.out_channels | |||
self.is_strided = cnf.stride > 1 | |||
self._is_cn = cnf.stride > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, what does cn
mean in here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's from the C0,C1...C5,Cn names used in Object Detection. I use this feature internally to find out where the downsampling was supposed to happen but it's not always done with strides so I had to rename it. If you have any better name for it, happy to change it. I could not think of any...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation. Given that this is private I'm fine with this name
""" | ||
# non-public config parameters | ||
reduce_divider = 2 if kwargs.pop('_reduced_tail', False) else 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this feature used in any of the models? Otherwise we can just remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a unique implementation detail from the paper on MobileNetV3 models and it's supposed to produce a further speed optimization on object detection and segmentation. In our training scripts we don't use it because we do transfer learning from ImageNet but if someone really wants to train it from scratch and go smaller I provide a way to do it.
On current master this is public (see reduced_tail
param) but here I decide to hide before the release and make it an internal implementation detail for future models. Not quite convinced we will use it but want to provide an implementation very close to the paper.
Personally I would prefer to keep it hidden for now and decide later whether we want this gone. Let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'm ok keeping this private for now and maybe removing it from the future.
Summary: * Making _segm_resnet() generic and reusable. * Adding fcn and deeplabv3 directly on mobilenetv3 backbone. * Adding tests for segmentation models. * Rename is_strided with _is_cn. * Add dilation support on MobileNetV3 for Segmentation. * Add Lite R-ASPP with MobileNetV3 backbone. * Add pretrained model weights. * Removing model fcn_mobilenet_v3_large. * Adding docs and imports. * Fixing typo and readme. Reviewed By: datumbox Differential Revision: D26156380 fbshipit-source-id: e62528b52728804a40da79c1311562a7f1c2afbd
Adding MobileNetV3 models for Semantic Segmentation (resolution 520):
Lite R-ASPP with Dilated MobileNetV3 Large Backbone
Heavily optimized for speed. Good for actual mobile usage.
Weight checkpoint:
Validate:
Accuracy metrics:
Speed Benchmark:
0.3278 sec per image on CPU
DeepLabV3 with Dilated MobileNetV3 Large Backbone
Offers good balance between speed and accuracy, significantly faster than the FCN model with a resnet50 backbone without sacrificing too much accuracy.
Weight checkpoint:
Validate:
Accuracy metrics:
Speed Benchmark:
0.5869 sec per image on CPU