Skip to content

Commit 770473d

Browse files
committed
export convert tools
2 parents 6e68c2d + 0516cca commit 770473d

File tree

116 files changed

+2972
-1586
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

116 files changed

+2972
-1586
lines changed

README.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,19 @@
1+
<!--start-->
12
<div align="center" markdown>
23

34
# MindOCR
45

6+
</div>
7+
<!--end-->
8+
9+
<div align="center" markdown>
10+
511
[![CI](https://github.com/mindspore-lab/mindocr/actions/workflows/ci.yml/badge.svg)](https://github.com/mindspore-lab/mindocr/actions/workflows/ci.yml)
612
[![license](https://img.shields.io/github/license/mindspore-lab/mindocr.svg)](https://github.com/mindspore-lab/mindocr/blob/main/LICENSE)
713
[![open issues](https://img.shields.io/github/issues/mindspore-lab/mindocr)](https://github.com/mindspore-lab/mindocr/issues)
814
[![PRs](https://img.shields.io/badge/PRs-welcome-pink.svg)](https://github.com/mindspore-lab/mindocr/pulls)
915
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
1016

11-
1217
English | [中文](README_CN.md)
1318

1419
[📝Introduction](#introduction) |
@@ -22,6 +27,7 @@ English | [中文](README_CN.md)
2227

2328
</div>
2429

30+
<!--start-->
2531
## Introduction
2632
MindOCR is an open-source toolbox for OCR development and application based on [MindSpore](https://www.mindspore.cn/en), which integrates series of mainstream text detection and recognition algorihtms/models, provides easy-to-use training and inference tools. It can accelerate the process of developing and deploying SoTA text detection and recognition models in real-world applications, such as DBNet/DBNet++ and CRNN/SVTR, and help fulfill the need of image-text understanding.
2733

@@ -151,7 +157,8 @@ You can do MindSpore Lite inference in MindOCR using **MindOCR models** or **Thi
151157
- Inference with MindSpore Lite
152158
- [Python/C++ Inference on Ascend 310](docs/en/inference/inference_tutorial.md)
153159
- [MindOCR Models Offline Inference - Quick Start](docs/en/inference/inference_quickstart.md)
154-
- [Third-party Models Offline Inference - Quick Start](docs/en/inference/inference_thirdparty_quickstart.md).
160+
- [Third-party Models Offline Inference - Quick Start](docs/en/inference/inference_thirdparty_quickstart.md)
161+
- [Model Conversion](docs/en/inference/convert_tutorial.md)
155162
- Developer Guides
156163
- [Customize Dataset](mindocr/data/README.md)
157164
- [Customize Data Transformation](mindocr/data/transforms/README.md)
@@ -385,3 +392,4 @@ If you find this project useful in your research, please consider citing:
385392
year={2023}
386393
}
387394
```
395+
<!--end-->

README_CN.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,16 @@
1+
<!--start-->
12
<div align="center" markdown>
23

34
# MindOCR
5+
<!--end-->
46

57
[![CI](https://github.com/mindspore-lab/mindocr/actions/workflows/ci.yml/badge.svg)](https://github.com/mindspore-lab/mindocr/actions/workflows/ci.yml)
68
[![license](https://img.shields.io/github/license/mindspore-lab/mindocr.svg)](https://github.com/mindspore-lab/mindocr/blob/main/LICENSE)
79
[![open issues](https://img.shields.io/github/issues/mindspore-lab/mindocr)](https://github.com/mindspore-lab/mindocr/issues)
810
[![PRs](https://img.shields.io/badge/PRs-welcome-pink.svg)](https://github.com/mindspore-lab/mindocr/pulls)
911
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
1012

11-
13+
<!--start-->
1214
[English](README.md) | 中文
1315

1416
[📝简介](#简介) |
@@ -384,3 +386,4 @@ MindOCR提供了[数据格式转换工具](tools/dataset_converters) ,以支
384386
year={2023}
385387
}
386388
```
389+
<!--end-->

configs/det/dbnet/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,8 +87,10 @@ DBNet and DBNet++ were trained on the ICDAR2015, MSRA-TD500, SCUT-CTW1500, Total
8787
| **Model** | **Context** | **Backbone** | **Pretrained** | **Recall** | **Precision** | **F-score** | **Train T.** | **Throughput** | **Recipe** | **Download** |
8888
|---------------------|----------------|---------------|----------------|------------|---------------|-------------|--------------|----------------|-------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
8989
| DBNet | D910x1-MS2.0-G | MobileNetV3 | ImageNet | 76.31% | 78.27% | 77.28% | 10 s/epoch | 100 img/s | [yaml](db_mobilenetv3_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_mobilenetv3-62c44539.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_mobilenetv3-62c44539-f14c6a13.mindir) |
90+
| DBNet | D910x8-MS2.3-G | MobileNetV3 | ImageNet | 76.22% | 77.98% | 77.09% | 1.1 s/epoch | 960 img/s | [yaml](db_mobilenetv3_icdar15_8p.yaml) | Coming soon |
9091
| DBNet | D910x1-MS2.0-G | ResNet-18 | ImageNet | 80.12% | 83.41% | 81.73% | 9.3 s/epoch | 108 img/s | [yaml](db_r18_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet18-0c0c4cfa.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet18-0c0c4cfa-cf46eb8b.mindir) |
9192
| DBNet | D910x1-MS2.0-G | ResNet-50 | ImageNet | 83.53% | 86.62% | 85.05% | 13.3 s/epoch | 75.2 img/s | [yaml](db_r50_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50-c3a4aa24.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50-c3a4aa24-fbf95c82.mindir) |
93+
| DBNet | D910x8-MS2.2-G | ResNet-50 | ImageNet | 82.62% | 88.54% | 85.48% | 2.3 s/epoch | 435 img/s | [yaml](db_r50_icdar15_8p.yaml) | Coming soon |
9294
| | | | | | | | | | | |
9395
| DBNet++ | D910x1-MS2.0-G | ResNet-50 | SynthText | 85.70% | 87.81% | 86.74% | 17.7 s/epoch | 56 img/s | [yaml](db++_r50_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50-068166c2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50-068166c2-9934aff0.mindir) |
9496
| DBNet++ | D910x1-MS2.2-G | ResNet-50 | SynthText | 86.81% | 86.85% | 86.86% | 12.7 s/epoch | 78.2 img/s | [yaml](db++_r50_icdar15_910.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_910-35dc71f2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_910-35dc71f2-e61a9c37.mindir) |

configs/det/dbnet/README_CN.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,8 +69,10 @@ DBNet和DBNet++在ICDAR2015,MSRA-TD500,SCUT-CTW1500,Total-Text和MLT2017
6969
| **模型** | **环境配置** | **骨干网络** | **预训练数据集** | **Recall** | **Precision** | **F-score** | **训练时间** | **吞吐量** | **配置文件** | **模型权重下载** |
7070
|---------------------|----------------|---------------|------------|------------|---------------|-------------|--------------|-----------|-------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
7171
| DBNet | D910x1-MS2.0-G | MobileNetV3 | ImageNet | 76.26% | 78.22% | 77.28% | 10 s/epoch | 100 img/s | [yaml](db_mobilenetv3_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_mobilenetv3-62c44539.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_mobilenetv3-62c44539-f14c6a13.mindir) |
72+
| DBNet | D910x8-MS2.3-G | MobileNetV3 | ImageNet | 76.22% | 77.98% | 77.09% | 1.1 s/epoch | 960 img/s | [yaml](db_mobilenetv3_icdar15_8p.yaml) | Coming soon |
7273
| DBNet | D910x1-MS2.0-G | ResNet-18 | ImageNet | 80.12% | 83.41% | 81.73% | 9.3 s/epoch | 108 img/s | [yaml](db_r18_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet18-0c0c4cfa.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet18-0c0c4cfa-cf46eb8b.mindir) |
7374
| DBNet | D910x1-MS2.0-G | ResNet-50 | ImageNet | 83.53% | 86.62% | 85.05% | 13.3 s/epoch | 75.2 img/s | [yaml](db_r50_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50-c3a4aa24.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50-c3a4aa24-fbf95c82.mindir) |
75+
| DBNet | D910x8-MS2.2-G | ResNet-50 | ImageNet | 82.62% | 88.54% | 85.48% | 2.3 s/epoch | 435 img/s | [yaml](db_r50_icdar15_8p.yaml) | Coming soon |
7476
| | | | | | | | | | | |
7577
| DBNet++ | D910x1-MS2.0-G | ResNet-50 | SynthText | 85.70% | 87.81% | 86.74% | 17.7 s/epoch | 56 img/s | [yaml](db++_r50_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50-068166c2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50-068166c2-9934aff0.mindir) |
7678
| DBNet++ | D910x1-MS2.2-G | ResNet-50 | SynthText | 86.81% | 86.85% | 86.86% | 12.7 s/epoch | 78.2 img/s | [yaml](db++_r50_icdar15_910.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_910-35dc71f2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_910-35dc71f2-e61a9c37.mindir) |

configs/det/dbnet/db_mobilenetv3_icdar15.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ train:
7878
data_dir: ic15/det/train/ch4_training_images
7979
label_file: ic15/det/train/det_gt.txt
8080
sample_ratio: 1.0
81+
use_minddata: True
8182
transform_pipeline:
8283
- DecodeImage:
8384
img_mode: RGB
@@ -135,6 +136,7 @@ eval:
135136
data_dir: ic15/det/test/ch4_test_images
136137
label_file: ic15/det/test/det_gt.txt
137138
sample_ratio: 1.0
139+
use_minddata: True
138140
transform_pipeline:
139141
- DecodeImage:
140142
img_mode: RGB
Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
system:
2+
mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore
3+
distribute: True
4+
amp_level: 'O0'
5+
seed: 42
6+
log_interval: 10
7+
val_while_train: True
8+
val_start_epoch: 500
9+
drop_overflow_update: False
10+
11+
model:
12+
type: det
13+
transform: null
14+
backbone:
15+
name: det_mobilenet_v3
16+
architecture: large
17+
alpha: 0.5
18+
out_stages: [5, 8, 14, 20]
19+
bottleneck_params:
20+
se_version: SqueezeExciteV2
21+
always_expand: True
22+
pretrained: https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv3/mobilenet_v3_large_050_no_scale_se_v2_expand-3c4047ac.ckpt
23+
neck:
24+
name: DBFPN
25+
out_channels: 256
26+
bias: False
27+
head:
28+
name: DBHead
29+
k: 50
30+
bias: False
31+
adaptive: True
32+
33+
postprocess:
34+
name: DBPostprocess
35+
box_type: quad # whether to output a polygon or a box
36+
binary_thresh: 0.3 # binarization threshold
37+
box_thresh: 0.6 # box score threshold
38+
max_candidates: 1000
39+
expand_ratio: 1.5 # coefficient for expanding predictions
40+
41+
metric:
42+
name: DetMetric
43+
main_indicator: f-score
44+
45+
loss:
46+
name: DBLoss
47+
eps: 1.0e-6
48+
l1_scale: 10
49+
bce_scale: 5
50+
bce_replace: bceloss
51+
52+
scheduler:
53+
scheduler: polynomial_decay
54+
lr: 0.02
55+
num_epochs: 2000
56+
decay_rate: 0.9
57+
warmup_epochs: 3
58+
59+
optimizer:
60+
opt: momentum
61+
filter_bias_and_bn: false
62+
momentum: 0.9
63+
weight_decay: 1.0e-4
64+
65+
# only used for mixed precision training
66+
loss_scaler:
67+
type: dynamic
68+
loss_scale: 512
69+
scale_factor: 2
70+
scale_window: 1000
71+
72+
train:
73+
ckpt_save_dir: './tmp_det'
74+
dataset_sink_mode: True
75+
dataset:
76+
type: DetDataset
77+
dataset_root: /data/ocr_datasets
78+
data_dir: ic15/det/train/ch4_training_images
79+
label_file: ic15/det/train/det_gt.txt
80+
sample_ratio: 1.0
81+
use_minddata: True
82+
transform_pipeline:
83+
- DecodeImage:
84+
img_mode: RGB
85+
to_float32: False
86+
- DetLabelEncode:
87+
- RandomColorAdjust:
88+
brightness: 0.1255 # 32.0 / 255
89+
saturation: 0.5
90+
- RandomHorizontalFlip:
91+
p: 0.5
92+
- RandomRotate:
93+
degrees: [ -10, 10 ]
94+
expand_canvas: False
95+
p: 1.0
96+
- RandomScale:
97+
scale_range: [ 0.5, 3.0 ]
98+
p: 1.0
99+
- RandomCropWithBBox:
100+
max_tries: 10
101+
min_crop_ratio: 0.1
102+
crop_size: [ 640, 640 ]
103+
p: 1.0
104+
- ValidatePolygons:
105+
- ShrinkBinaryMap:
106+
min_text_size: 8
107+
shrink_ratio: 0.4
108+
- BorderMap:
109+
shrink_ratio: 0.4
110+
thresh_min: 0.3
111+
thresh_max: 0.7
112+
- NormalizeImage:
113+
bgr_to_rgb: False
114+
is_hwc: True
115+
mean: imagenet
116+
std: imagenet
117+
- ToCHWImage:
118+
# the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visualize
119+
output_columns: [ 'image', 'binary_map', 'mask', 'thresh_map', 'thresh_mask']
120+
# output_columns: ['image'] # for debug op performance
121+
net_input_column_index: [0] # input indices for network forward func in output_columns
122+
label_column_index: [1, 2, 3, 4] # input indices marked as label
123+
124+
loader:
125+
shuffle: True
126+
batch_size: 8
127+
drop_remainder: True
128+
num_workers: 10
129+
130+
eval:
131+
ckpt_load_path: tmp_det/best.ckpt
132+
dataset_sink_mode: False
133+
dataset:
134+
type: DetDataset
135+
dataset_root: /data/ocr_datasets
136+
data_dir: ic15/det/test/ch4_test_images
137+
label_file: ic15/det/test/det_gt.txt
138+
sample_ratio: 1.0
139+
use_minddata: True
140+
transform_pipeline:
141+
- DecodeImage:
142+
img_mode: RGB
143+
to_float32: False
144+
- DetLabelEncode:
145+
- DetResize: # GridResize 32
146+
target_size: [ 736, 1280 ]
147+
keep_ratio: False
148+
limit_type: none
149+
divisor: 32
150+
- NormalizeImage:
151+
bgr_to_rgb: False
152+
is_hwc: True
153+
mean: imagenet
154+
std: imagenet
155+
- ToCHWImage:
156+
# the order of the dataloader list, matching the network input and the labels for evaluation
157+
output_columns: [ 'image', 'polys', 'ignore_tags', 'shape_list' ]
158+
net_input_column_index: [0] # input indices for network forward func in output_columns
159+
label_column_index: [1, 2] # input indices marked as label
160+
161+
loader:
162+
shuffle: False
163+
batch_size: 1 # TODO: due to dynamic shape of polygons (num of boxes varies), BS has to be 1
164+
drop_remainder: False
165+
num_workers: 3

configs/det/dbnet/db_r50_icdar15.yaml

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -158,35 +158,35 @@ eval:
158158

159159
predict:
160160
ckpt_load_path: tmp_det/best.ckpt
161+
output_save_dir: ./output
161162
dataset_sink_mode: False
162163
dataset:
163164
type: PredictDataset
164165
dataset_root: path/to/dataset_root
165166
data_dir: ic15/det/test/ch4_test_images
166-
# label_file: test.txt
167167
sample_ratio: 1.0
168168
transform_pipeline:
169169
- DecodeImage:
170170
img_mode: RGB
171171
to_float32: False
172-
# - DetLabelEncode:
173-
- DetResize: # GridResize 32
174-
target_size: [ 736, 1280 ]
175-
keep_ratio: False
176-
limit_type: none
177-
divisor: 32
172+
keep_ori: True
173+
- DetResize:
174+
keep_ratio: True
175+
padding: False
176+
limit_side_len: 960
177+
limit_type: max
178178
- NormalizeImage:
179179
bgr_to_rgb: False
180180
is_hwc: True
181181
mean: imagenet
182182
std: imagenet
183183
- ToCHWImage:
184184
# the order of the dataloader list, matching the network input and the labels for evaluation
185-
output_columns: [ 'img_path', 'image', 'raw_img_shape' ] # shape in h, w order
186-
# num_keys_of_labels: 2 # num labels
185+
output_columns: ["image", "img_path", "shape_list", "image_ori"]
186+
net_input_column_index: [ 0 ] # input indices for network forward func in output_columns
187187

188188
loader:
189189
shuffle: False
190-
batch_size: 1 # TODO: due to dynamic shape of polygons (num of boxes varies), BS has to be 1
190+
batch_size: 1
191191
drop_remainder: False
192192
num_workers: 2

0 commit comments

Comments
 (0)