Skip to content

add DBNet++ ResNet50 icdar2015 8p training #661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 11 additions & 10 deletions configs/det/dbnet/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,16 +66,17 @@ DBNet和DBNet++在ICDAR2015,MSRA-TD500,SCUT-CTW1500,Total-Text和MLT2017
### ICDAR2015
<div align="center">

| **模型** | **环境配置** | **骨干网络** | **预训练数据集** | **Recall** | **Precision** | **F-score** | **训练时间** | **吞吐量** | **配置文件** | **模型权重下载** |
|---------------------|----------------|---------------|------------|------------|---------------|-------------|--------------|-----------|-------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DBNet | D910x1-MS2.0-G | MobileNetV3 | ImageNet | 76.26% | 78.22% | 77.28% | 10 s/epoch | 100 img/s | [yaml](db_mobilenetv3_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_mobilenetv3-62c44539.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_mobilenetv3-62c44539-f14c6a13.mindir) |
| DBNet | D910x8-MS2.3-G | MobileNetV3 | ImageNet | 76.22% | 77.98% | 77.09% | 1.1 s/epoch | 960 img/s | [yaml](db_mobilenetv3_icdar15_8p.yaml) | Coming soon |
| DBNet | D910x1-MS2.0-G | ResNet-18 | ImageNet | 80.12% | 83.41% | 81.73% | 9.3 s/epoch | 108 img/s | [yaml](db_r18_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet18-0c0c4cfa.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet18-0c0c4cfa-cf46eb8b.mindir) |
| DBNet | D910x1-MS2.0-G | ResNet-50 | ImageNet | 83.53% | 86.62% | 85.05% | 13.3 s/epoch | 75.2 img/s | [yaml](db_r50_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50-c3a4aa24.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50-c3a4aa24-fbf95c82.mindir) |
| DBNet | D910x8-MS2.2-G | ResNet-50 | ImageNet | 82.62% | 88.54% | 85.48% | 2.3 s/epoch | 435 img/s | [yaml](db_r50_icdar15_8p.yaml) | Coming soon |
| | | | | | | | | | | |
| DBNet++ | D910x1-MS2.0-G | ResNet-50 | SynthText | 85.70% | 87.81% | 86.74% | 17.7 s/epoch | 56 img/s | [yaml](db++_r50_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50-068166c2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50-068166c2-9934aff0.mindir) |
| DBNet++ | D910x1-MS2.2-G | ResNet-50 | SynthText | 86.81% | 86.85% | 86.86% | 12.7 s/epoch | 78.2 img/s | [yaml](db++_r50_icdar15_910.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_910-35dc71f2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_910-35dc71f2-e61a9c37.mindir) |
| **模型** | **环境配置** | **骨干网络** | **预训练数据集** | **Recall** | **Precision** | **F-score** | **训练时间** | **吞吐量** | **配置文件** | **模型权重下载** |
|---------------------|-----------------|---------------|------------|------------|---------------|-------------|--------------|------------|----------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DBNet | D910x1-MS2.0-G | MobileNetV3 | ImageNet | 76.26% | 78.22% | 77.28% | 10 s/epoch | 100 img/s | [yaml](db_mobilenetv3_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_mobilenetv3-62c44539.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_mobilenetv3-62c44539-f14c6a13.mindir) |
| DBNet | D910x8-MS2.3-G | MobileNetV3 | ImageNet | 76.22% | 77.98% | 77.09% | 1.1 s/epoch | 960 img/s | [yaml](db_mobilenetv3_icdar15_8p.yaml) | Coming soon |
| DBNet | D910x1-MS2.0-G | ResNet-18 | ImageNet | 80.12% | 83.41% | 81.73% | 9.3 s/epoch | 108 img/s | [yaml](db_r18_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet18-0c0c4cfa.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet18-0c0c4cfa-cf46eb8b.mindir) |
| DBNet | D910x1-MS2.0-G | ResNet-50 | ImageNet | 83.53% | 86.62% | 85.05% | 13.3 s/epoch | 75.2 img/s | [yaml](db_r50_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50-c3a4aa24.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50-c3a4aa24-fbf95c82.mindir) |
| DBNet | D910x8-MS2.2-G | ResNet-50 | ImageNet | 82.62% | 88.54% | 85.48% | 2.3 s/epoch | 435 img/s | [yaml](db_r50_icdar15_8p.yaml) | Coming soon |
| | | | | | | | | | | |
| DBNet++ | D910x1-MS2.0-G | ResNet-50 | SynthText | 85.70% | 87.81% | 86.74% | 17.7 s/epoch | 56 img/s | [yaml](db++_r50_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50-068166c2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50-068166c2-9934aff0.mindir) |
| DBNet++ | D910x8-MS2.2-G | ResNet-50 | SynthText | 85.41% | 89.55% | 87.43% | 1.78 s/epoch | 432 img/s | [yaml](db++_r50_icdar15_8p.yaml) | Coming soon |
| DBNet++ | D910*x1-MS2.2-G | ResNet-50 | SynthText | 86.81% | 86.85% | 86.86% | 12.7 s/epoch | 78.2 img/s | [yaml](db++_r50_icdar15_910.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_910-35dc71f2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_910-35dc71f2-e61a9c37.mindir) |
</div>

> 链接中模型DBNet的MindIR导出时的输入Shape为`(1,3,736,1280)`,模型DBNet++的MindIR导出时的输入Shape为`(1,3,1152,2048)`。
Expand Down
159 changes: 159 additions & 0 deletions configs/det/dbnet/db++_r50_icdar15_8p.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
system:
mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore
distribute: True
amp_level: 'O0'
seed: 42
log_interval: 10
val_while_train: True
val_start_epoch: 600
drop_overflow_update: False

model:
type: det
transform: null
backbone:
name: det_resnet50
pretrained: False
neck:
name: DBFPN
out_channels: 256
bias: False
use_asf: True # Adaptive Scale Fusion
channel_attention: True # Use channel attention in ASF
head:
name: DBHead
k: 50
bias: False
adaptive: True
pretrained: https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_synthtext-40655acb.ckpt

postprocess:
name: DBPostprocess
box_type: quad # whether to output a polygon or a box
binary_thresh: 0.3 # binarization threshold
box_thresh: 0.7 # box score threshold
max_candidates: 1000
expand_ratio: 1.5 # coefficient for expanding predictions

metric:
name: DetMetric
main_indicator: f-score

loss:
name: DBLoss
eps: 1.0e-6
l1_scale: 10
bce_scale: 5
bce_replace: bceloss

scheduler:
scheduler: polynomial_decay
lr: 0.007
num_epochs: 1200
decay_rate: 0.9
warmup_epochs: 3

optimizer:
opt: momentum
filter_bias_and_bn: false
momentum: 0.9
weight_decay: 1.0e-4

# only used for mixed precision training
loss_scaler:
type: dynamic
loss_scale: 512
scale_factor: 2
scale_window: 1000

train:
ema: True
ckpt_save_dir: './tmp_det'
dataset_sink_mode: True
dataset:
type: DetDataset
dataset_root: /data/ocr_datasets
data_dir: ic15/det/train/ch4_training_images
label_file: ic15/det/train/det_gt.txt
sample_ratio: 1.0
transform_pipeline:
- DecodeImage:
img_mode: RGB
to_float32: False
- DetLabelEncode:
- RandomColorAdjust:
brightness: 0.1255 # 32.0 / 255
saturation: 0.5
- RandomHorizontalFlip:
p: 0.5
- RandomRotate:
degrees: [ -10, 10 ]
expand_canvas: False
p: 1.0
- RandomScale:
scale_range: [ 0.5, 3.0 ]
p: 1.0
- RandomCropWithBBox:
max_tries: 10
min_crop_ratio: 0.1
crop_size: [ 640, 640 ]
p: 1.0
- ValidatePolygons:
- ShrinkBinaryMap:
min_text_size: 8
shrink_ratio: 0.4
- BorderMap:
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
- NormalizeImage:
bgr_to_rgb: False
is_hwc: True
mean: imagenet
std: imagenet
- ToCHWImage:
# the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visualize
output_columns: [ 'image', 'binary_map', 'mask', 'thresh_map', 'thresh_mask' ] #'img_path']
net_input_column_index: [ 0 ] # input indices for network forward func in output_columns
label_column_index: [ 1, 2, 3, 4 ] # input indices marked as label

loader:
shuffle: True
batch_size: 32
drop_remainder: True
num_workers: 8

eval:
ckpt_load_path: tmp_det/best.ckpt
dataset_sink_mode: False
dataset:
type: DetDataset
dataset_root: /data/ocr_datasets
data_dir: ic15/det/test/ch4_test_images
label_file: ic15/det/test/det_gt.txt
sample_ratio: 1.0
transform_pipeline:
- DecodeImage:
img_mode: RGB
to_float32: False
- DetLabelEncode:
- DetResize:
target_size: [ 1152, 2048 ]
keep_ratio: True
padding: True
- NormalizeImage:
bgr_to_rgb: False
is_hwc: True
mean: imagenet
std: imagenet
- ToCHWImage:
# the order of the dataloader list, matching the network input and the labels for evaluation
output_columns: [ 'image', 'polys', 'ignore_tags', 'shape_list']
net_input_column_index: [ 0 ] # input indices for network forward func in output_columns
label_column_index: [ 1, 2 ] # input indices marked as label

loader:
shuffle: False
batch_size: 1 # TODO: due to dynamic shape of polygons (num of boxes varies), BS has to be 1
drop_remainder: False
num_workers: 2