mindspore-lab · SamitHuang · May 10, 2023 · May 5, 2023 · May 10, 2023 · SamitHuang
diff --git a/configs/det/dbnet/README.md b/configs/det/dbnet/README.md
@@ -58,6 +58,15 @@ DBNet may generate inaccurate or discrete bounding boxes.
 
 ## 2. Results
 
+### SynthText
+
+<div align="center">
+
+| **Model**         | **Context**    | **Backbone** | **Pretrained** |  **Train Loss**|  **Train T.** | **Throughput** | **Recipe**                  | **Download**                 |
+|-------------------|----------------|--------------|----------------|-------------|------------|---------------|-------------|--------------|
+| DBNet (ours)      | D910x1-MS2.0-G | ResNet-50    | ImageNet       |     2.25      |10470 s/epoch  | 82.02 img/s      | [yaml](db_r50_synthtext.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_synthtext-40655acb.ckpt)  |
+</div>
+
 ### ICDAR2015
 
 <div align="center">
@@ -72,6 +81,17 @@ DBNet may generate inaccurate or discrete bounding boxes.
 | DBNet++ (PaddleOCR) | -              | ResNet-50_DCN | SynthText      | 82.66%     | 90.89%        | 86.58%      | -            | -              | -                             | -                                                                                                                                                                                                       |
 </div>
 
+### MSRA-TD500
+
+<div align="center">
+
+| **Model**         | **Context**    | **Backbone** | **Pretrained** | **Recall** | **Precision** | **F-score** | **Train T.** | **Throughput** | **Recipe**                  | **Download**                                                                                                                                                                                         |
+|-------------------|----------------|--------------|----------------|------------|---------------|-------------|--------------|----------------|-----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| DBNet (ours)      | D910x1-MS2.0-G | ResNet-50    | SynthText       | 82.47%     | 87.75%        | 85.03%      | 13.3 s/epoch  | 51.1 img/s      | [yaml](db_r50_td500.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_td500-0d12b5e8.ckpt)  |
+</div>
+
+> MSRA-TD500 dataset has 300 training images and 200 testing images, reference paper [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/abs/1911.08947), we trained using an extra 400 traning images from HUST-TR400. You can down all [dataset](https://paddleocr.bj.bcebos.com/dataset/TD_TR.tar) for training.
+
 
 #### Notes
 - Context: Training context denoted as {device}x{pieces}-{MS version}{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
@@ -87,8 +107,33 @@ Please refer to the [installation instruction](https://github.com/mindspore-lab/
 
 ### 3.2 Dataset preparation
 
-Please download [ICDAR2015](https://rrc.cvc.uab.es/?ch=4&com=downloads) dataset, and convert the labels to the desired format referring to [dataset_converters](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README.md).
+#### 3.2.1 SynthText dataset
 
+Please download [SynthText](https://academictorrents.com/details/2dba9518166cbd141534cbf381aa3e99a087e83c) dataset，The directory structure of the extracted data should be as follows:
+
+``` text
+.
+├── SynthText
+│   ├── 1
+│   │   ├── img_1.jpg
+│   │   ├── img_2.jpg
+│   │   └── ...
+│   ├── 2
+│   │   ├── img_1.jpg
+│   │   ├── img_2.jpg
+│   │   └── ...
+│   ├── ...
+│   ├── 200
+│   │   ├── img_1.jpg
+│   │   ├── img_2.jpg
+│   │   └── ...
+│   └── gt.mat
+
+```
+
+#### 3.2.2 ICDAR2015 dataset
+
+Please download [ICDAR2015](https://rrc.cvc.uab.es/?ch=4&com=downloads) dataset, and convert the labels to the desired format referring to [dataset_converters](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README.md).
 
 The prepared dataset file struture should be:  
 
@@ -108,6 +153,30 @@ The prepared dataset file struture should be:
     └── train_det_gt.txt
 ```
 
+#### 3.2.3 MSRA-TD500 数据集
+
+Please download [MSRA-TD500](http://www.iapr-tc11.org/mediawiki/index.php/MSRA_Text_Detection_500_Database_(MSRA-TD500)) dataset，and convert the labels to the desired format referring to [dataset_converters](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README.md).
+
+The prepared dataset file struture should be: 
+
+```txt
+MSRA-TD500
+ ├── test
+ │   ├── IMG_0059.gt 
+ │   ├── IMG_0059.JPG
+ │   ├── IMG_0080.gt
+ │   ├── IMG_0080.JPG
+ │   ├── ...
+ │   ├── train_det_gt.txt
+ ├── train
+ │   ├── IMG_0030.gt 
+ │   ├── IMG_0030.JPG
+ │   ├── IMG_0063.gt
+ │   ├── IMG_0063.JPG
+ │   ├── ...
+ │   ├── test_det_gt.txt
+```
+
 ### 3.3 Update yaml config file
 
 Update `configs/det/dbnet/db_r50_icdar15.yaml` configuration file with data paths,

diff --git a/configs/det/dbnet/README_CN.md b/configs/det/dbnet/README_CN.md
@@ -43,6 +43,16 @@ DBNet++在检测不同尺寸的文本方面表现更好，尤其是对于尺寸
 
 ## 2. 实验结果
 
+### SynthText
+
+<div align="center">
+
+| **模型**         | **环境配置**    | **骨干网络** | **预训练数据集** | **训练Loss**| **训练时间** | **吞吐量** | **配置文件**                  | **模型权重下载**                 |
+|-----------------|----------------|--------------|----------------|---------|---------|---------------|-------------|--------------|
+| DBNet      | D910x1-MS2.0-G | ResNet-50    | ImageNet       |   2.25    |10470 s/epoch  | 82.02 img/s      | [yaml](db_r50_synthtext.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_synthtext-40655acb.ckpt)  |
+</div>
+
+
 ### ICDAR2015
 <div align="center">
 
@@ -58,6 +68,18 @@ DBNet++在检测不同尺寸的文本方面表现更好，尤其是对于尺寸
 </div>
 
 
+### MSRA-TD500
+
+<div align="center">
+
+| **模型**         | **环境配置**    | **骨干网络** | **预训练数据集** | **Recall** | **Precision** | **F-score** | **训练时间** | **吞吐量** | **配置文件**                  | **模型权重下载**                                                                                                                                                                                         |
+|-------------------|----------------|--------------|----------------|------------|---------------|-------------|--------------|----------------|-----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| DBNet (ours)      | D910x1-MS2.0-G | ResNet-50    | SynthText       | 82.47%     | 87.75%        | 85.03%      | 13.3 s/epoch  | 51.1 img/s      | [yaml](db_r50_td500.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_td500-0d12b5e8.ckpt)  |
+</div>
+
+> MSRA-TD500数据集有300训练集图片和200测试集图片，参考论文[Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/abs/1911.08947)，我们训练此权重额外使用了来自HUST-TR400数据集的400训练集图片。可以在此下载全部[数据集](https://paddleocr.bj.bcebos.com/dataset/TD_TR.tar)用于训练。
+
+
 #### 注释：
 - 环境配置：训练的环境配置表示为 {处理器}x{处理器数量}-{MS模式}，其中 Mindspore 模式可以是 G-graph 模式或 F-pynative 模式。
 - DBNet的训练时长受数据处理部分和不同运行环境的影响非常大。
@@ -70,6 +92,32 @@ DBNet++在检测不同尺寸的文本方面表现更好，尤其是对于尺寸
 
 ### 3.2 数据准备
 
+#### 3.2.1 SynthText 数据集
+
+请从[该网址](https://academictorrents.com/details/2dba9518166cbd141534cbf381aa3e99a087e83c)下载SynthText数据集，解压后的数据的目录结构应该如下所示：
+
+``` text
+.
+├── SynthText
+│   ├── 1
+│   │   ├── img_1.jpg
+│   │   ├── img_2.jpg
+│   │   └── ...
+│   ├── 2
+│   │   ├── img_1.jpg
+│   │   ├── img_2.jpg
+│   │   └── ...
+│   ├── ...
+│   ├── 200
+│   │   ├── img_1.jpg
+│   │   ├── img_2.jpg
+│   │   └── ...
+│   └── gt.mat
+
+```
+
+#### 3.2.2 ICDAR2015 数据集
+
 请从[该网址](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载ICDAR2015数据集，然后参考[数据转换](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README_CN.md)对数据集标注进行格式转换。
 
 完成数据准备工作后，数据的目录结构应该如下所示： 
@@ -90,6 +138,30 @@ DBNet++在检测不同尺寸的文本方面表现更好，尤其是对于尺寸
     └── train_det_gt.txt
 ```
 
+#### 3.2.3 MSRA-TD500 数据集
+
+请从[该网址](http://www.iapr-tc11.org/mediawiki/index.php/MSRA_Text_Detection_500_Database_(MSRA-TD500))下载MSRA-TD500数据集，然后参考[数据转换](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README_CN.md)对数据集标注进行格式转换。
+
+完成数据准备工作后，数据的目录结构应该如下所示： 
+
+```txt
+MSRA-TD500
+ ├── test
+ │   ├── IMG_0059.gt 
+ │   ├── IMG_0059.JPG
+ │   ├── IMG_0080.gt
+ │   ├── IMG_0080.JPG
+ │   ├── ...
+ │   ├── train_det_gt.txt
+ ├── train
+ │   ├── IMG_0030.gt 
+ │   ├── IMG_0030.JPG
+ │   ├── IMG_0063.gt
+ │   ├── IMG_0063.JPG
+ │   ├── ...
+ │   ├── test_det_gt.txt
+```
+
 ### 3.3 配置说明
 
 在配置文件`configs/det/dbnet/db_r50_icdar15.yaml`中更新如下文件路径。其中`dataset_root`会分别和`dataset_root`以及`label_file`拼接构成完整的数据集目录和标签文件路径。

diff --git a/configs/det/dbnet/db_r50_synthtext.yaml b/configs/det/dbnet/db_r50_synthtext.yaml
@@ -1,6 +1,6 @@
 system:
   mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore
-  distribute: True
+  distribute: False
   amp_level: 'O0'
   seed: 42
   log_interval: 100
@@ -31,9 +31,10 @@ loss:
   bce_replace: bceloss
 
 scheduler:
-  scheduler: constant
-  lr: 1.0e-4
+  scheduler: polynomial_decay
+  lr: 0.007
   num_epochs: 2
+  decay_rate: 0.9
   warmup_epochs: 0
 
 optimizer:
@@ -73,7 +74,7 @@ train:
       - RandomCropWithBBox:
           max_tries: 10
           min_crop_ratio: 0.1
-          crop_size: [ 512, 512 ]  # following 'Synthetic Data for Text Localisation in Natural Images'
+          crop_size: [ 640, 640 ]
           p: 1.0
       - ValidatePolygons:
       - ShrinkBinaryMap:
@@ -97,6 +98,6 @@ train:
 
   loader:
     shuffle: True
-    batch_size: 20
+    batch_size: 16
     drop_remainder: True
     num_workers: 8
diff --git a/configs/det/dbnet/db_r50_td500.yaml b/configs/det/dbnet/db_r50_td500.yaml
@@ -0,0 +1,156 @@
+system:
+  mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore
+  distribute: False
+  amp_level: 'O0'
+  seed: 42
+  log_interval: 10
+  val_while_train: True
+  drop_overflow_update: False
+  val_interval: 5
+
+model:
+  type: det
+  transform: null
+  backbone:
+    name: det_resnet50
+    pretrained: True
+  neck:
+    name: DBFPN
+    out_channels: 256
+    bias: False
+  head:
+    name: DBHead
+    k: 50
+    bias: False
+    adaptive: True
+  pretrained: https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_synthtext-40655acb.ckpt
+
+postprocess:
+  name: DBPostprocess
+  output_polygon: False   # whether to output a polygon or a box
+  binary_thresh: 0.3      # binarization threshold
+  box_thresh: 0.6         # box score threshold
+  max_candidates: 1000
+  expand_ratio: 1.5       # coefficient for expanding predictions
+
+metric:
+  name: DetMetric
+  main_indicator: f-score
+
+loss:
+  name: L1BalancedCELoss
+  eps: 1.0e-6
+  l1_scale: 10
+  bce_scale: 5
+  bce_replace: bceloss
+
+scheduler:
+  scheduler: polynomial_decay
+  lr: 0.007
+  num_epochs: 1200
+  decay_rate: 0.9
+  warmup_epochs: 3
+
+optimizer:
+  opt: SGD
+  filter_bias_and_bn: false
+  momentum: 0.9
+  weight_decay: 1.0e-4
+
+# only used for mixed precision training
+loss_scaler:
+  type: dynamic
+  loss_scale: 512
+  scale_factor: 2
+  scale_window: 1000
+
+train:
+  ckpt_save_dir: './tmp_det'
+  dataset_sink_mode: True
+  dataset:
+    type: DetDataset
+    dataset_root: /data/ocr_datasets
+    data_dir: TD500_TR400/data
+    label_file: TD500_TR400/data/train_gt_all_labels.txt
+    sample_ratio: 1.0
+    transform_pipeline:
+      - DecodeImage:
+          img_mode: RGB
+          to_float32: False
+      - DetLabelEncode:
+      - RandomColorAdjust:
+          brightness: 0.1255  # 32.0 / 255
+          saturation: 0.5
+      - IaaAugment:
+          Fliplr: { p: 0.5 }
+          Affine: { rotate: [ -10, 10 ], p: 1.0 }
+      - RandomScale:
+          scale_range: [ 0.5, 3.0 ]
+          p: 1.0
+      - RandomCropWithBBox:
+          max_tries: 10
+          min_crop_ratio: 0.1
+          crop_size: [ 640, 640 ]
+          p: 1.0
+      - ValidatePolygons:
+      - ShrinkBinaryMap:
+          min_text_size: 8
+          shrink_ratio: 0.4
+      - BorderMap:
+          shrink_ratio: 0.4
+          thresh_min: 0.3
+          thresh_max: 0.7
+      - NormalizeImage:
+          bgr_to_rgb: False
+          is_hwc: True
+          mean: imagenet
+          std: imagenet
+      - ToCHWImage:
+    #  the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visualize
+    output_columns: [ 'image', 'binary_map', 'mask', 'thresh_map', 'thresh_mask' ] #'img_path']
+#    output_columns: ['image'] # for debug op performance
+    net_input_column_index: [0] # input indices for network forward func in output_columns
+    label_column_index: [1, 2, 3, 4] # input indices marked as label
+
+  loader:
+    shuffle: True
+    batch_size: 20
+    drop_remainder: True
+    num_workers: 16
+
+eval:
+  ckpt_load_path: tmp_det/best.ckpt
+  dataset_sink_mode: False
+  dataset:
+    type: DetDataset
+    dataset_root: /data/ocr_datasets
+    data_dir: TD500_TR400/data
+    label_file: TD500_TR400/data/test_gt_labels.txt
+    sample_ratio: 1.0
+    transform_pipeline:
+      - DecodeImage:
+          img_mode: RGB
+          to_float32: False
+      - DetLabelEncode:
+      - GridResize:
+          factor: 32
+      # GridResize already sets the evaluation size to [ 736, 1280 ].
+      # Uncomment ScalePadImage block for other resolutions.
+      - ScalePadImage:
+          target_size: [ 736, 736 ] # h, w
+      - NormalizeImage:
+          bgr_to_rgb: False
+          is_hwc: True
+          mean: imagenet
+          std: imagenet
+      - ToCHWImage:
+    #  the order of the dataloader list, matching the network input and the labels for evaluation
+    output_columns: [ 'image', 'polys', 'ignore_tags' ]
+    net_input_column_index: [0] # input indices for network forward func in output_columns
+    label_column_index: [1, 2] # input indices marked as label
+
+  loader:
+    shuffle: False
+    batch_size: 1 # TODO: due to dynamic shape of polygons (num of boxes varies), BS has to be 1
+    drop_remainder: False
+    num_workers: 2