Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][Feature] Add ViT-Adapter Model #2762

Merged
merged 17 commits into from
Mar 17, 2023
16 changes: 16 additions & 0 deletions configs/vit_adapter/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Vision Transformer Adapter for Dense Predictions

## Reference

> Chen, Zhe, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, and Yu Qiao. "Vision Transformer Adapter for Dense Predictions." arXiv preprint arXiv:2205.08534 (2022).

## Prerequesites

Download the ms_deform_attn.zip (https://paddleseg.bj.bcebos.com/dygraph/customized_ops/ms_deform_attn.zip), and then refer to the readme to install ms_deform_attn lib.
## Performance

### ADE20K

| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
|-|-|-|-|-|-|-|-|
|UPerNetViTAdapter|ViT-Adapter-Tiny|512x512|160000|41.90%|-|-|[model](https://paddleseg.bj.bcebos.com/dygraph/ade20k/upernet_vit_adapter_tiny_ade20k_512x512_160k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/ade20k/upernet_vit_adapter_tiny_ade20k_512x512_160k/train_log.txt) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=88173046bd09f61da5f48db66baddd7d)|
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
_base_: '../_base_/ade20k.yml'

batch_size: 4 # total batch size is 16
iters: 160000

train_dataset:
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
- type: RandomPaddingCrop
crop_size: [512, 512]
- type: RandomHorizontalFlip
- type: RandomDistort
brightness_range: 0.4
contrast_range: 0.4
saturation_range: 0.4
- type: Normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]

val_dataset:
transforms:
- type: Resize
target_size: [2048, 512]
keep_ratio: True
size_divisor: 32
- type: Normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]

test_config:
is_slide: True
crop_size: [512, 512]
stride: [341, 341]

optimizer:
_inherited_: False
type: AdamW
weight_decay: 0.01

lr_scheduler:
type: PolynomialDecay
learning_rate: 6.0e-5
end_lr: 0
power: 1.0
warmup_iters: 1500
warmup_start_lr: 1.0e-6

loss:
types:
- type: CrossEntropyLoss
avg_non_ignore: False
coef: [1, 0.4]

model:
type: UPerNetViTAdapter
backbone:
type: ViTAdapter_Tiny
pretrained: https://paddleseg.bj.bcebos.com/dygraph/backbone/deit_tiny_patch16_224.zip
backbone_indices: [0, 1, 2, 3]
channels: 512
pool_scales: [1, 2, 3, 6]
dropout_ratio: 0.1
aux_loss: True
aux_channels: 256
1 change: 1 addition & 0 deletions paddleseg/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@
from .mscale_ocrnet import MscaleOCRNet
from .topformer import TopFormer
from .rtformer import RTFormer
from .upernet_vit_adapter import UPerNetViTAdapter
from .lpsnet import LPSNet
from .maskformer import MaskFormer
from .segnext import SegNeXt
Expand Down
3 changes: 2 additions & 1 deletion paddleseg/models/backbones/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,6 @@
from .cae import *
from .top_transformer import *
from .uhrnet import *
from .vit_adapter import *
from .hrformer import *
from .mscan import *
from .mscan import *
Loading