Skip to content

Segformer backbone Mix Visual Transformer #632

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 8, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ The main features of this library are:

- High level API (just two lines to create a neural network)
- 9 models architectures for binary and multi class segmentation (including legendary Unet)
- 113 available encoders (and 400+ encoders from [timm](https://github.com/rwightman/pytorch-image-models))
- 119 available encoders (and 400+ encoders from [timm](https://github.com/rwightman/pytorch-image-models))
- All encoders have pre-trained weights for faster and better convergence
- Popular metrics and losses for training routines

Expand Down Expand Up @@ -352,6 +352,29 @@ The following is a list of supported encoders in the SMP. Select the appropriate
</div>
</details>

<details>
<summary style="margin-left: 25px;">Mix Vision Transformer</summary>
<div style="margin-left: 25px;">

Backbone from SegFormer pretrained on Imagenet! Can be used with other decoders from package, you can combine Mix Visual Transformer with Unet, FPN and others!

Limitations:

- encoder is not supported by Linknet, Unet++
- encoder is not supported by FPN if encoder depth != 5

|Encoder |Weights |Params, M |
|--------------------------------|:------------------------------:|:------------------------------:|
|mit_b0 |imagenet |3M |
|mit_b1 |imagenet |13M |
|mit_b2 |imagenet |24M |
|mit_b3 |imagenet |44M |
|mit_b4 |imagenet |60M |
|mit_b5 |imagenet |81M |

</div>
</details>


\* `ssl`, `swsl` - semi-supervised and weakly-supervised learning on ImageNet ([repo](https://github.com/facebookresearch/semi-supervised-ImageNet1K-models)).

Expand Down
15 changes: 15 additions & 0 deletions docs/encoders.rst
Original file line number Diff line number Diff line change
Expand Up @@ -324,3 +324,18 @@ VGG
+-------------+------------+-------------+
| vgg19\_bn | imagenet | 20M |
+-------------+------------+-------------+


Mix Visual Transformer
~~~~~~~~~~~~~~~~~~~~~

+-----------+----------+------------+
| Encoder | Weights | Params, M |
+===========+==========+============+
| mit\_b0 | imagenet | 3M |
| mit\_b1 | imagenet | 13M |
| mit\_b2 | imagenet | 24M |
| mit\_b3 | imagenet | 44M |
| mit\_b4 | imagenet | 60M |
| mit\_b5 | imagenet | 81M |
+-----------+----------+------------+
4 changes: 4 additions & 0 deletions segmentation_models_pytorch/decoders/fpn/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,10 @@ def __init__(
):
super().__init__()

# validate input params
if encoder_name.startswith("mit_b") and encoder_depth != 5:
raise ValueError("Encoder {} support only encoder_depth=5".format(encoder_name))

self.encoder = get_encoder(
encoder_name,
in_channels=in_channels,
Expand Down
3 changes: 3 additions & 0 deletions segmentation_models_pytorch/decoders/linknet/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,9 @@ def __init__(
):
super().__init__()

if encoder_name.startswith("mit_b"):
raise ValueError("Encoder `{}` is not supported for Linknet".format(encoder_name))

self.encoder = get_encoder(
encoder_name,
in_channels=in_channels,
Expand Down
3 changes: 3 additions & 0 deletions segmentation_models_pytorch/decoders/unetplusplus/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,9 @@ def __init__(
):
super().__init__()

if encoder_name.startswith("mit_b"):
raise ValueError("UnetPlusPlus is not support encoder_name={}".format(encoder_name))

self.encoder = get_encoder(
encoder_name,
in_channels=in_channels,
Expand Down
2 changes: 2 additions & 0 deletions segmentation_models_pytorch/encoders/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from .timm_sknet import timm_sknet_encoders
from .timm_mobilenetv3 import timm_mobilenetv3_encoders
from .timm_gernet import timm_gernet_encoders
from .mix_transformer import mix_transformer_encoders

from .timm_universal import TimmUniversalEncoder

Expand All @@ -42,6 +43,7 @@
encoders.update(timm_sknet_encoders)
encoders.update(timm_mobilenetv3_encoders)
encoders.update(timm_gernet_encoders)
encoders.update(mix_transformer_encoders)


def get_encoder(name, in_channels=3, depth=5, weights=None, output_stride=32, **kwargs):
Expand Down
Loading