-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Segformer backbone re-implementation #594
Conversation
Codecov Report
@@ Coverage Diff @@
## master #594 +/- ##
==========================================
+ Coverage 85.18% 85.28% +0.09%
==========================================
Files 105 107 +2
Lines 5671 5817 +146
Branches 923 951 +28
==========================================
+ Hits 4831 4961 +130
- Misses 662 673 +11
- Partials 178 183 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
2. Remove rebundant functions and unit tests;
mmseg/models/backbones/mit.py
Outdated
self.dwconv = DWConv(feedforward_channels) | ||
self.act = build_activation_layer(act_cfg) | ||
self.fc2 = Linear(feedforward_channels, in_channels) | ||
self.drop = nn.Dropout(drop_rate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can replace FC with 1x1 conv, so we can avoid dimension transform and integrate depthwise conv in MLP.
mmseg/models/backbones/mit.py
Outdated
return x | ||
|
||
|
||
class Mlp(BaseModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be careful that the variable or class name needs to be consistent with the paper.
Rename Mlp to MixFFN
mmseg/models/backbones/mit.py
Outdated
return x | ||
|
||
|
||
class Attention(BaseModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename Attention
to EfficientMultiheadAttention
.
We can also inherit from MultiheadAttention
in MMCV and pass different query, key, and value in forward phrase according to different spatial reduction.
mmseg/models/backbones/mit.py
Outdated
self.sr_ratio = sr_ratio | ||
if sr_ratio > 1: | ||
self.sr = ConvModule( | ||
in_channels=dim, | ||
out_channels=dim, | ||
kernel_size=sr_ratio, | ||
stride=sr_ratio) | ||
_, self.norm = build_norm_layer(norm_cfg, dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we inherit from MultiheadAttention
in MMCV, we only need to add these lines.
mmseg/models/backbones/mit.py
Outdated
return x | ||
|
||
|
||
class Block(BaseModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename it.
mmseg/models/backbones/mit.py
Outdated
proj_drop=drop_rate, | ||
sr_ratio=sr_ratio) | ||
# NOTE: drop path for stochastic depth, we shall see if this is better | ||
# than dropout here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment necessary?
mmseg/models/backbones/mit.py
Outdated
img_size = to_2tuple(img_size) | ||
patch_size = to_2tuple(patch_size) | ||
|
||
self.img_size = img_size | ||
self.patch_size = patch_size | ||
num_rows, num_cols = img_size[0] // patch_size[0], img_size[ | ||
1] // patch_size[1] | ||
self.num_patches = num_rows * num_cols |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these lines necessary?
mmseg/models/backbones/mit.py
Outdated
num_rows, num_cols = img_size[0] // patch_size[0], img_size[ | ||
1] // patch_size[1] | ||
self.num_patches = num_rows * num_cols | ||
self.proj = nn.Conv2d( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ConvModule
mmseg/models/backbones/mit.py
Outdated
self.pretrained = pretrained | ||
self.depths = depths | ||
# patch_embed | ||
self.patch_embed1 = OverlapPatchEmbed( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ModuleList
mmseg/models/backbones/mit.py
Outdated
strict=False, | ||
logger=logger) | ||
|
||
def reset_drop_path(self, drop_path_rate): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used?
2. Add some unit tests for MixVisionTransformer;
mmseg/models/utils/shape_convert.py
Outdated
@@ -0,0 +1,10 @@ | |||
def nlc_to_nchw(tensor, H, W): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring.
mmseg/models/backbones/mit.py
Outdated
A PyTorch implement of : `SegFormer: Simple and Efficient Design for | ||
Semantic Segmentation with Transformers` - | ||
https://arxiv.org/pdf/2105.15203.pdf | ||
|
||
in_channels (int): Number of input channels. Default: 3. | ||
embed_dims (int): Embedding dimension. Default: 768. | ||
num_stags (int): The num of stages. Default: 4. | ||
num_layers (list[int]): The layer number of each transformer encode | ||
layer. Default: [3, 4, 6, 3]. | ||
num_heads (list[int]): The attention heads of each transformer | ||
encode layer. Default: [1, 2, 4, 8]. | ||
patch_sizes (list[int]): The patch_size of each overlapped patch embedding. | ||
Default: [7, 3, 3, 3]. | ||
strides (list[int]): The stride of each overlapped patch embedding. | ||
Default: [4, 2, 2, 2]. | ||
sr_ratios (list[int]): The spatial reduction rate of each transformer | ||
encode layer. Default: [8, 4, 2, 1]. | ||
out_indices (list[int] | tuple[int] | int): Output from which stages. | ||
Default: (0, 1, 2, 3). | ||
mlp_ratio (int): ratio of mlp hidden dim to embedding dim. | ||
Default: 4. | ||
out_indices (list | tuple | int): Output from which stages. | ||
Default: -1. | ||
qkv_bias (bool): Enable bias for qkv if True. Default: True. | ||
drop_rate (float): Probability of an element to be zeroed. | ||
Default 0.0 | ||
attn_drop_rate (float): The drop out rate for attention layer. | ||
Default 0.0 | ||
drop_path_rate (float): stochastic depth rate. Default 0.0 | ||
norm_cfg (dict): Config dict for normalization layer. | ||
Default: dict(type='LN') | ||
act_cfg (dict): The activation config for FFNs. | ||
Defalut: dict(type='GELU'). | ||
pretrain_style (str): Choose to use official or mmcls pretrain weights. | ||
Default: official. | ||
pretrained (str, optional): model pretrained path. Default: None. | ||
init_cfg (dict or list[dict], optional): Initialization config dict. | ||
Default: None. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format of the docstring is not correct.
mmseg/models/backbones/mit.py
Outdated
dropout_layer=dict(type='DropPath', drop_prob=drop_path_rate), | ||
act_cfg=act_cfg) | ||
|
||
def forward(self, x, H, W): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def forward(self, x, H, W): | |
def forward(self, x, hw_shape): |
mmseg/models/backbones/mit.py
Outdated
conv1x1 = partial( | ||
ConvModule, | ||
kernel_size=1, | ||
stride=1, | ||
bias=True, | ||
norm_cfg=None, | ||
act_cfg=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to use partial
.
mmseg/models/backbones/mit.py
Outdated
from ..utils import PatchEmbed, mit_convert, nchw_to_nlc, nlc_to_nchw | ||
|
||
|
||
class PEConv(BaseModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to use PEConv
to wrap conv
?
mmseg/models/backbones/mit.py
Outdated
ffn_drop=0., | ||
pe_index=1, | ||
dropout_layer=None, | ||
add_identity=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is add_identity
argument necessary?
mmseg/models/backbones/mit.py
Outdated
# first position of MixFFN | ||
if pe_index == 0: | ||
layers.append(PEConv(in_channels)) | ||
for idx in range(num_fcs - 1): | ||
container = [] | ||
container.append( | ||
conv1x1( | ||
in_channels=in_channels, | ||
out_channels=feedforward_channels)) | ||
# middle position of MixFFN | ||
if pe_index == idx + 1: | ||
container.append(PEConv(feedforward_channels)) | ||
container.append(self.activate) | ||
container.append(nn.Dropout(ffn_drop)) | ||
layers.append(Sequential(*container)) | ||
layers.append( | ||
conv1x1( | ||
in_channels=feedforward_channels, out_channels=in_channels)) | ||
# Last position of MixFFN | ||
if pe_index == num_fcs: | ||
layers.append(PEConv(feedforward_channels)) | ||
layers.append(nn.Dropout(ffn_drop)) | ||
self.layers = Sequential(*layers) | ||
self.dropout_layer = build_dropout( | ||
dropout_layer) if dropout_layer else torch.nn.Identity() | ||
self.add_identity = add_identity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is too long.
Consider simplifying it.
mmseg/models/backbones/mit.py
Outdated
num_fcs=2, | ||
act_cfg=dict(type='GELU'), | ||
ffn_drop=0., | ||
pe_index=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is too complicated, we may remove pe_index
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just fix thenum_fcs
to 2, and insert PE
in the middle.
mmseg/models/utils/shape_convert.py
Outdated
hw_shape (Sequence[int]): The height and width of output feature map. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return
mmseg/models/utils/shape_convert.py
Outdated
"""Flatten [N, C, H, W] shape tensor to [N, L, C] shape tensor. | ||
|
||
Args: | ||
x (Tensor): The input tensor for convertion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return
mmseg/models/utils/embed.py
Outdated
(0, self.patch_size[1] - W % self.patch_size[1], 0, 0)) | ||
|
||
# TODO: Process overlapping op | ||
if not self.overlapping: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overlapping is not precise for this.
We may consider something like auto_pad
or pad_to_patch_size
.
We also need to make it an argument and add a docstring for that.
mmseg/models/utils/embed.py
Outdated
pad_to_patch_size (bool, optional): Whether to pad feature map shape | ||
to multiple patch size. Default: False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may make it True
by default.
* [Feature]Segformer re-implementation * Using act_cfg and norm_cfg to control activation and normalization * Split this PR into several little PRs * Fix lint error * Remove SegFormerHead * parameters init refactor * 1. Refactor segformer backbone parameters init; 2. Remove rebundant functions and unit tests; * Remove rebundant codes * 1. Remove rebundant codes; 2. Modify module name; * Refactor the backbone of segformer using mmcv.cnn.bricks.transformer.py * Fix some code logic bugs. * Add mit_convert.py to match pretrain keys of segformer. * Resolve some comments. * 1. Add some assert to ensure right params; 2. Support flexible peconv position; * Add pe_index assert and fix unit test. * 1. Add doc string for MixVisionTransformer; 2. Add some unit tests for MixVisionTransformer; * Use hw_shape to pass shape of feature map. * 1. Fix doc string of MixVisionTransformer; 2. Simplify MixFFN; 3. Modify H, W to hw_shape; * Add more unit tests. * Add doc string for shape convertion functions. * Add some unit tests to improve code coverage. * Fix Segformer backbone pretrain weights match bug. * resolve the shape convertion functions doc string. * Add pad_to_patch_size arg. * Modify default value of pad_to_patch_size arg.
* [Flax] Fix unet and ddim scheduler * correct * finish
* Update README_cn.md * Update README_cn.md
- [ ] Add head SegFormerHead;- [ ] Add dataset transform pipeline AlignedResize;- [ ] Add some config for segformer;