mindspore-lab
diff --git a/‎README.md
Lines changed: 10 additions & 2 deletions b/‎README.md
Lines changed: 10 additions & 2 deletions
diff --git a/‎README_CN.md
Lines changed: 4 additions & 1 deletion b/‎README_CN.md
Lines changed: 4 additions & 1 deletion
diff --git a/‎configs/det/dbnet/README.md
Lines changed: 2 additions & 0 deletions b/‎configs/det/dbnet/README.md
Lines changed: 2 additions & 0 deletions
diff --git a/‎configs/det/dbnet/README_CN.md
Lines changed: 2 additions & 0 deletions b/‎configs/det/dbnet/README_CN.md
Lines changed: 2 additions & 0 deletions
diff --git a/‎configs/det/dbnet/db_mobilenetv3_icdar15.yaml
Lines changed: 2 additions & 0 deletions b/‎configs/det/dbnet/db_mobilenetv3_icdar15.yaml
Lines changed: 2 additions & 0 deletions
diff --git a/‎configs/det/dbnet/db_mobilenetv3_icdar15_8p.yaml
Lines changed: 165 additions & 0 deletions b/‎configs/det/dbnet/db_mobilenetv3_icdar15_8p.yaml
Lines changed: 165 additions & 0 deletions
diff --git a/‎configs/det/dbnet/db_r50_icdar15.yaml
Lines changed: 10 additions & 10 deletions b/‎configs/det/dbnet/db_r50_icdar15.yaml
Lines changed: 10 additions & 10 deletions
@@ -1,14 +1,19 @@
+<!--start-->
 <div align="center" markdown>
 
 # MindOCR
 
+</div>
+<!--end-->
+
+<div align="center" markdown>
+
 [![CI](https://github.com/mindspore-lab/mindocr/actions/workflows/ci.yml/badge.svg)](https://github.com/mindspore-lab/mindocr/actions/workflows/ci.yml)
 [![license](https://img.shields.io/github/license/mindspore-lab/mindocr.svg)](https://github.com/mindspore-lab/mindocr/blob/main/LICENSE)
 [![open issues](https://img.shields.io/github/issues/mindspore-lab/mindocr)](https://github.com/mindspore-lab/mindocr/issues)
 [![PRs](https://img.shields.io/badge/PRs-welcome-pink.svg)](https://github.com/mindspore-lab/mindocr/pulls)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 
-
 English | [中文](README_CN.md)
 
 [📝Introduction](#introduction) |
@@ -22,6 +27,7 @@ English | [中文](README_CN.md)
 
 </div>
 
+<!--start-->
 ## Introduction
 MindOCR is an open-source toolbox for OCR development and application based on [MindSpore](https://www.mindspore.cn/en), which integrates series of mainstream text detection and recognition algorihtms/models, provides easy-to-use training and inference tools. It can accelerate the process of developing and deploying SoTA text detection and recognition models in real-world applications, such as DBNet/DBNet++ and CRNN/SVTR, and help fulfill the need of image-text understanding.
 
@@ -151,7 +157,8 @@ You can do MindSpore Lite inference in MindOCR using **MindOCR models** or **Thi
 - Inference with MindSpore Lite
     - [Python/C++ Inference on Ascend 310](docs/en/inference/inference_tutorial.md)
     - [MindOCR Models Offline Inference - Quick Start](docs/en/inference/inference_quickstart.md)
-    - [Third-party Models Offline Inference - Quick Start](docs/en/inference/inference_thirdparty_quickstart.md).
+    - [Third-party Models Offline Inference - Quick Start](docs/en/inference/inference_thirdparty_quickstart.md)
+    - [Model Conversion](docs/en/inference/convert_tutorial.md)
 - Developer Guides
     - [Customize Dataset](mindocr/data/README.md)
     - [Customize Data Transformation](mindocr/data/transforms/README.md)
@@ -385,3 +392,4 @@ If you find this project useful in your research, please consider citing:
     year={2023}
 }
 ```
+<!--end-->
@@ -1,14 +1,16 @@
+<!--start-->
 <div align="center" markdown>
 
 # MindOCR
+<!--end-->
 
 [![CI](https://github.com/mindspore-lab/mindocr/actions/workflows/ci.yml/badge.svg)](https://github.com/mindspore-lab/mindocr/actions/workflows/ci.yml)
 [![license](https://img.shields.io/github/license/mindspore-lab/mindocr.svg)](https://github.com/mindspore-lab/mindocr/blob/main/LICENSE)
 [![open issues](https://img.shields.io/github/issues/mindspore-lab/mindocr)](https://github.com/mindspore-lab/mindocr/issues)
 [![PRs](https://img.shields.io/badge/PRs-welcome-pink.svg)](https://github.com/mindspore-lab/mindocr/pulls)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 
-
+<!--start-->
 [English](README.md) | 中文
 
 [📝简介](#简介) |
@@ -384,3 +386,4 @@ MindOCR提供了[数据格式转换工具](tools/dataset_converters) ，以支
     year={2023}
 }
 ```
+<!--end-->
@@ -87,8 +87,10 @@ DBNet and DBNet++ were trained on the ICDAR2015, MSRA-TD500, SCUT-CTW1500, Total
 | **Model**           | **Context**    | **Backbone**  | **Pretrained** | **Recall** | **Precision** | **F-score** | **Train T.** | **Throughput** | **Recipe**                          | **Download**                                                                                                                                                                                              |
 |---------------------|----------------|---------------|----------------|------------|---------------|-------------|--------------|----------------|-------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | DBNet               | D910x1-MS2.0-G | MobileNetV3   | ImageNet       | 76.31%     | 78.27%        | 77.28%      | 10 s/epoch   | 100 img/s      | [yaml](db_mobilenetv3_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_mobilenetv3-62c44539.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_mobilenetv3-62c44539-f14c6a13.mindir) |
+| DBNet               | D910x8-MS2.3-G | MobileNetV3   | ImageNet       | 76.22%     | 77.98%        | 77.09%      | 1.1 s/epoch  | 960 img/s      | [yaml](db_mobilenetv3_icdar15_8p.yaml) | Coming soon                                                                                                                                                                                             |
 | DBNet               | D910x1-MS2.0-G | ResNet-18     | ImageNet       | 80.12%     | 83.41%        | 81.73%      | 9.3 s/epoch  | 108 img/s      | [yaml](db_r18_icdar15.yaml)         | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet18-0c0c4cfa.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet18-0c0c4cfa-cf46eb8b.mindir)       |
 | DBNet               | D910x1-MS2.0-G | ResNet-50     | ImageNet       | 83.53%     | 86.62%        | 85.05%      | 13.3 s/epoch | 75.2 img/s       | [yaml](db_r50_icdar15.yaml)         | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50-c3a4aa24.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50-c3a4aa24-fbf95c82.mindir)       |
+| DBNet               | D910x8-MS2.2-G | ResNet-50     | ImageNet       | 82.62%     | 88.54%        | 85.48%      | 2.3 s/epoch  | 435 img/s  | [yaml](db_r50_icdar15_8p.yaml)      | Coming soon                                                                                                                                                                                                     |
 |                     |                |               |                |            |               |             |              |                |                                     |                                                                                                                                                                                                           |
 | DBNet++             | D910x1-MS2.0-G | ResNet-50     | SynthText  | 85.70%     | 87.81%        | 86.74%      | 17.7 s/epoch | 56 img/s  | [yaml](db++_r50_icdar15.yaml)       | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50-068166c2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50-068166c2-9934aff0.mindir)   |
 | DBNet++             | D910x1-MS2.2-G | ResNet-50     | SynthText  | 86.81%     | 86.85%        | 86.86%      | 12.7 s/epoch | 78.2 img/s  | [yaml](db++_r50_icdar15_910.yaml)       | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_910-35dc71f2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_910-35dc71f2-e61a9c37.mindir) |
 
@@ -69,8 +69,10 @@ DBNet和DBNet++在ICDAR2015，MSRA-TD500，SCUT-CTW1500，Total-Text和MLT2017
 | **模型**              | **环境配置**       | **骨干网络**      | **预训练数据集** | **Recall** | **Precision** | **F-score** | **训练时间**     | **吞吐量**   | **配置文件**                            | **模型权重下载**                                                                                                                                                                                                |
 |---------------------|----------------|---------------|------------|------------|---------------|-------------|--------------|-----------|-------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | DBNet               | D910x1-MS2.0-G | MobileNetV3   | ImageNet       | 76.26%     | 78.22%        | 77.28%      | 10 s/epoch   | 100 img/s      | [yaml](db_mobilenetv3_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_mobilenetv3-62c44539.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_mobilenetv3-62c44539-f14c6a13.mindir) |
+| DBNet               | D910x8-MS2.3-G | MobileNetV3   | ImageNet       | 76.22%     | 77.98%        | 77.09%      | 1.1 s/epoch  | 960 img/s      | [yaml](db_mobilenetv3_icdar15_8p.yaml) | Coming soon                                                                                                                                                                                             |
 | DBNet               | D910x1-MS2.0-G | ResNet-18     | ImageNet       | 80.12%     | 83.41%        | 81.73%      | 9.3 s/epoch  | 108 img/s      | [yaml](db_r18_icdar15.yaml)         | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet18-0c0c4cfa.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet18-0c0c4cfa-cf46eb8b.mindir)       |
 | DBNet               | D910x1-MS2.0-G | ResNet-50     | ImageNet       | 83.53%     | 86.62%        | 85.05%      | 13.3 s/epoch | 75.2 img/s       | [yaml](db_r50_icdar15.yaml)         | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50-c3a4aa24.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50-c3a4aa24-fbf95c82.mindir)       |
+| DBNet               | D910x8-MS2.2-G | ResNet-50     | ImageNet       | 82.62%     | 88.54%        | 85.48%      | 2.3 s/epoch  | 435 img/s  | [yaml](db_r50_icdar15_8p.yaml)      | Coming soon                                                                                                                                                                                                     |
 |                     |                |               |            |            |               |             |              |           |                                     |                                                                                                                                                                                                           |
 | DBNet++             | D910x1-MS2.0-G | ResNet-50     | SynthText  | 85.70%     | 87.81%        | 86.74%      | 17.7 s/epoch | 56 img/s  | [yaml](db++_r50_icdar15.yaml)       | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50-068166c2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50-068166c2-9934aff0.mindir)   |
 | DBNet++             | D910x1-MS2.2-G | ResNet-50     | SynthText  | 86.81%     | 86.85%        | 86.86%      | 12.7 s/epoch | 78.2 img/s  | [yaml](db++_r50_icdar15_910.yaml)       | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_910-35dc71f2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_910-35dc71f2-e61a9c37.mindir)   |
 
@@ -78,6 +78,7 @@ train:
     data_dir: ic15/det/train/ch4_training_images
     label_file: ic15/det/train/det_gt.txt
     sample_ratio: 1.0
+    use_minddata: True
     transform_pipeline:
       - DecodeImage:
           img_mode: RGB
@@ -135,6 +136,7 @@ eval:
     data_dir: ic15/det/test/ch4_test_images
     label_file: ic15/det/test/det_gt.txt
     sample_ratio: 1.0
+    use_minddata: True
     transform_pipeline:
       - DecodeImage:
           img_mode: RGB
 
@@ -0,0 +1,165 @@
+system:
+  mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore
+  distribute: True
+  amp_level: 'O0'
+  seed: 42
+  log_interval: 10
+  val_while_train: True
+  val_start_epoch: 500
+  drop_overflow_update: False
+
+model:
+  type: det
+  transform: null
+  backbone:
+    name: det_mobilenet_v3
+    architecture: large
+    alpha: 0.5
+    out_stages: [5, 8, 14, 20]
+    bottleneck_params:
+      se_version: SqueezeExciteV2
+      always_expand:  True
+    pretrained: https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv3/mobilenet_v3_large_050_no_scale_se_v2_expand-3c4047ac.ckpt
+  neck:
+    name: DBFPN
+    out_channels: 256
+    bias: False
+  head:
+    name: DBHead
+    k: 50
+    bias: False
+    adaptive: True
+
+postprocess:
+  name: DBPostprocess
+  box_type: quad   # whether to output a polygon or a box
+  binary_thresh: 0.3      # binarization threshold
+  box_thresh: 0.6         # box score threshold
+  max_candidates: 1000
+  expand_ratio: 1.5       # coefficient for expanding predictions
+
+metric:
+  name: DetMetric
+  main_indicator: f-score
+
+loss:
+  name: DBLoss
+  eps: 1.0e-6
+  l1_scale: 10
+  bce_scale: 5
+  bce_replace: bceloss
+
+scheduler:
+  scheduler: polynomial_decay
+  lr: 0.02
+  num_epochs: 2000
+  decay_rate: 0.9
+  warmup_epochs: 3
+
+optimizer:
+  opt: momentum
+  filter_bias_and_bn: false
+  momentum: 0.9
+  weight_decay: 1.0e-4
+
+# only used for mixed precision training
+loss_scaler:
+  type: dynamic
+  loss_scale: 512
+  scale_factor: 2
+  scale_window: 1000
+
+train:
+  ckpt_save_dir: './tmp_det'
+  dataset_sink_mode: True
+  dataset:
+    type: DetDataset
+    dataset_root: /data/ocr_datasets
+    data_dir: ic15/det/train/ch4_training_images
+    label_file: ic15/det/train/det_gt.txt
+    sample_ratio: 1.0
+    use_minddata: True
+    transform_pipeline:
+      - DecodeImage:
+          img_mode: RGB
+          to_float32: False
+      - DetLabelEncode:
+      - RandomColorAdjust:
+          brightness: 0.1255  # 32.0 / 255
+          saturation: 0.5
+      - RandomHorizontalFlip:
+          p: 0.5
+      - RandomRotate:
+          degrees: [ -10, 10 ]
+          expand_canvas: False
+          p: 1.0
+      - RandomScale:
+          scale_range: [ 0.5, 3.0 ]
+          p: 1.0
+      - RandomCropWithBBox:
+          max_tries: 10
+          min_crop_ratio: 0.1
+          crop_size: [ 640, 640 ]
+          p: 1.0
+      - ValidatePolygons:
+      - ShrinkBinaryMap:
+          min_text_size: 8
+          shrink_ratio: 0.4
+      - BorderMap:
+          shrink_ratio: 0.4
+          thresh_min: 0.3
+          thresh_max: 0.7
+      - NormalizeImage:
+          bgr_to_rgb: False
+          is_hwc: True
+          mean: imagenet
+          std: imagenet
+      - ToCHWImage:
+    #  the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visualize
+    output_columns: [ 'image', 'binary_map', 'mask', 'thresh_map', 'thresh_mask']
+#    output_columns: ['image'] # for debug op performance
+    net_input_column_index: [0] # input indices for network forward func in output_columns
+    label_column_index: [1, 2, 3, 4] # input indices marked as label
+
+  loader:
+    shuffle: True
+    batch_size: 8
+    drop_remainder: True
+    num_workers: 10
+
+eval:
+  ckpt_load_path: tmp_det/best.ckpt
+  dataset_sink_mode: False
+  dataset:
+    type: DetDataset
+    dataset_root: /data/ocr_datasets
+    data_dir: ic15/det/test/ch4_test_images
+    label_file: ic15/det/test/det_gt.txt
+    sample_ratio: 1.0
+    use_minddata: True
+    transform_pipeline:
+      - DecodeImage:
+          img_mode: RGB
+          to_float32: False
+      - DetLabelEncode:
+      - DetResize:  # GridResize 32
+          target_size: [ 736, 1280 ]
+          keep_ratio: False
+          limit_type: none
+          divisor: 32
+      - NormalizeImage:
+          bgr_to_rgb: False
+          is_hwc: True
+          mean: imagenet
+          std: imagenet
+      - ToCHWImage:
+    #  the order of the dataloader list, matching the network input and the labels for evaluation
+    output_columns: [ 'image', 'polys', 'ignore_tags', 'shape_list' ]
+    net_input_column_index: [0] # input indices for network forward func in output_columns
+    label_column_index: [1, 2] # input indices marked as label
+
+  loader:
+    shuffle: False
+    batch_size: 1 # TODO: due to dynamic shape of polygons (num of boxes varies), BS has to be 1
+    drop_remainder: False
+    num_workers: 3
@@ -158,35 +158,35 @@ eval:
 
 predict:
   ckpt_load_path: tmp_det/best.ckpt
+  output_save_dir: ./output
   dataset_sink_mode: False
   dataset:
     type: PredictDataset
     dataset_root: path/to/dataset_root
     data_dir: ic15/det/test/ch4_test_images
-#    label_file: test.txt
     sample_ratio: 1.0
     transform_pipeline:
       - DecodeImage:
           img_mode: RGB
           to_float32: False
-#      - DetLabelEncode:
-      - DetResize:  # GridResize 32
-          target_size: [ 736, 1280 ]
-          keep_ratio: False
-          limit_type: none
-          divisor: 32
+          keep_ori: True
+      - DetResize:
+          keep_ratio: True
+          padding: False
+          limit_side_len: 960
+          limit_type: max
       - NormalizeImage:
           bgr_to_rgb: False
           is_hwc: True
           mean: imagenet
           std: imagenet
       - ToCHWImage:
     #  the order of the dataloader list, matching the network input and the labels for evaluation
-    output_columns: [ 'img_path', 'image', 'raw_img_shape' ]  # shape in h, w order
-#    num_keys_of_labels: 2 # num labels
+    output_columns: ["image", "img_path", "shape_list", "image_ori"]
+    net_input_column_index: [ 0 ] # input indices for network forward func in output_columns
 
   loader:
     shuffle: False
-    batch_size: 1 # TODO: due to dynamic shape of polygons (num of boxes varies), BS has to be 1
+    batch_size: 1
     drop_remainder: False
     num_workers: 2