Skip to content

add dbnet yaml for synthtext dataset and td500 dataset #257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 70 additions & 1 deletion configs/det/dbnet/README.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要在model readme中补充synthtext和td500的data preparation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,15 @@ DBNet may generate inaccurate or discrete bounding boxes.

## 2. Results

### SynthText

<div align="center">

| **Model** | **Context** | **Backbone** | **Pretrained** | **Train Loss**| **Train T.** | **Throughput** | **Recipe** | **Download** |
|-------------------|----------------|--------------|----------------|-------------|------------|---------------|-------------|--------------|
| DBNet (ours) | D910x1-MS2.0-G | ResNet-50 | ImageNet | 2.25 |10470 s/epoch | 82.02 img/s | [yaml](db_r50_synthtext.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_synthtext-40655acb.ckpt) |
</div>

### ICDAR2015

<div align="center">
Expand All @@ -72,6 +81,17 @@ DBNet may generate inaccurate or discrete bounding boxes.
| DBNet++ (PaddleOCR) | - | ResNet-50_DCN | SynthText | 82.66% | 90.89% | 86.58% | - | - | - | - |
</div>

### MSRA-TD500

<div align="center">

| **Model** | **Context** | **Backbone** | **Pretrained** | **Recall** | **Precision** | **F-score** | **Train T.** | **Throughput** | **Recipe** | **Download** |
|-------------------|----------------|--------------|----------------|------------|---------------|-------------|--------------|----------------|-----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DBNet (ours) | D910x1-MS2.0-G | ResNet-50 | SynthText | 82.47% | 87.75% | 85.03% | 13.3 s/epoch | 51.1 img/s | [yaml](db_r50_td500.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_td500-0d12b5e8.ckpt) |
</div>

> MSRA-TD500 dataset has 300 training images and 200 testing images, reference paper [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/abs/1911.08947), we trained using an extra 400 traning images from HUST-TR400. You can down all [dataset](https://paddleocr.bj.bcebos.com/dataset/TD_TR.tar) for training.


#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS version}{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
Expand All @@ -87,8 +107,33 @@ Please refer to the [installation instruction](https://github.com/mindspore-lab/

### 3.2 Dataset preparation

Please download [ICDAR2015](https://rrc.cvc.uab.es/?ch=4&com=downloads) dataset, and convert the labels to the desired format referring to [dataset_converters](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README.md).
#### 3.2.1 SynthText dataset

Please download [SynthText](https://academictorrents.com/details/2dba9518166cbd141534cbf381aa3e99a087e83c) dataset,The directory structure of the extracted data should be as follows:

``` text
.
├── SynthText
│   ├── 1
│   │   ├── img_1.jpg
│   │   ├── img_2.jpg
│   │   └── ...
│   ├── 2
│   │   ├── img_1.jpg
│   │   ├── img_2.jpg
│   │   └── ...
│   ├── ...
│   ├── 200
│   │   ├── img_1.jpg
│   │   ├── img_2.jpg
│   │   └── ...
│   └── gt.mat

```

#### 3.2.2 ICDAR2015 dataset

Please download [ICDAR2015](https://rrc.cvc.uab.es/?ch=4&com=downloads) dataset, and convert the labels to the desired format referring to [dataset_converters](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README.md).

The prepared dataset file struture should be:

Expand All @@ -108,6 +153,30 @@ The prepared dataset file struture should be:
   └── train_det_gt.txt
```

#### 3.2.3 MSRA-TD500 数据集

Please download [MSRA-TD500](http://www.iapr-tc11.org/mediawiki/index.php/MSRA_Text_Detection_500_Database_(MSRA-TD500)) dataset,and convert the labels to the desired format referring to [dataset_converters](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README.md).

The prepared dataset file struture should be:

```txt
MSRA-TD500
├── test
│ ├── IMG_0059.gt
│ ├── IMG_0059.JPG
│ ├── IMG_0080.gt
│ ├── IMG_0080.JPG
│ ├── ...
│ ├── train_det_gt.txt
├── train
│ ├── IMG_0030.gt
│ ├── IMG_0030.JPG
│ ├── IMG_0063.gt
│ ├── IMG_0063.JPG
│ ├── ...
│ ├── test_det_gt.txt
```

### 3.3 Update yaml config file

Update `configs/det/dbnet/db_r50_icdar15.yaml` configuration file with data paths,
Expand Down
72 changes: 72 additions & 0 deletions configs/det/dbnet/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,16 @@ DBNet++在检测不同尺寸的文本方面表现更好,尤其是对于尺寸

## 2. 实验结果

### SynthText

<div align="center">

| **模型** | **环境配置** | **骨干网络** | **预训练数据集** | **训练Loss**| **训练时间** | **吞吐量** | **配置文件** | **模型权重下载** |
|-----------------|----------------|--------------|----------------|---------|---------|---------------|-------------|--------------|
| DBNet | D910x1-MS2.0-G | ResNet-50 | ImageNet | 2.25 |10470 s/epoch | 82.02 img/s | [yaml](db_r50_synthtext.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_synthtext-40655acb.ckpt) |
</div>


### ICDAR2015
<div align="center">

Expand All @@ -58,6 +68,18 @@ DBNet++在检测不同尺寸的文本方面表现更好,尤其是对于尺寸
</div>


### MSRA-TD500

<div align="center">

| **模型** | **环境配置** | **骨干网络** | **预训练数据集** | **Recall** | **Precision** | **F-score** | **训练时间** | **吞吐量** | **配置文件** | **模型权重下载** |
|-------------------|----------------|--------------|----------------|------------|---------------|-------------|--------------|----------------|-----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DBNet (ours) | D910x1-MS2.0-G | ResNet-50 | SynthText | 82.47% | 87.75% | 85.03% | 13.3 s/epoch | 51.1 img/s | [yaml](db_r50_td500.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_td500-0d12b5e8.ckpt) |
</div>

> MSRA-TD500数据集有300训练集图片和200测试集图片,参考论文[Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/abs/1911.08947),我们训练此权重额外使用了来自HUST-TR400数据集的400训练集图片。可以在此下载全部[数据集](https://paddleocr.bj.bcebos.com/dataset/TD_TR.tar)用于训练。


#### 注释:
- 环境配置:训练的环境配置表示为 {处理器}x{处理器数量}-{MS模式},其中 Mindspore 模式可以是 G-graph 模式或 F-pynative 模式。
- DBNet的训练时长受数据处理部分和不同运行环境的影响非常大。
Expand All @@ -70,6 +92,32 @@ DBNet++在检测不同尺寸的文本方面表现更好,尤其是对于尺寸

### 3.2 数据准备

#### 3.2.1 SynthText 数据集

请从[该网址](https://academictorrents.com/details/2dba9518166cbd141534cbf381aa3e99a087e83c)下载SynthText数据集,解压后的数据的目录结构应该如下所示:

``` text
.
├── SynthText
│   ├── 1
│   │   ├── img_1.jpg
│   │   ├── img_2.jpg
│   │   └── ...
│   ├── 2
│   │   ├── img_1.jpg
│   │   ├── img_2.jpg
│   │   └── ...
│   ├── ...
│   ├── 200
│   │   ├── img_1.jpg
│   │   ├── img_2.jpg
│   │   └── ...
│   └── gt.mat

```

#### 3.2.2 ICDAR2015 数据集

请从[该网址](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载ICDAR2015数据集,然后参考[数据转换](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README_CN.md)对数据集标注进行格式转换。

完成数据准备工作后,数据的目录结构应该如下所示:
Expand All @@ -90,6 +138,30 @@ DBNet++在检测不同尺寸的文本方面表现更好,尤其是对于尺寸
   └── train_det_gt.txt
```

#### 3.2.3 MSRA-TD500 数据集

请从[该网址](http://www.iapr-tc11.org/mediawiki/index.php/MSRA_Text_Detection_500_Database_(MSRA-TD500))下载MSRA-TD500数据集,然后参考[数据转换](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README_CN.md)对数据集标注进行格式转换。

完成数据准备工作后,数据的目录结构应该如下所示:

```txt
MSRA-TD500
├── test
│ ├── IMG_0059.gt
│ ├── IMG_0059.JPG
│ ├── IMG_0080.gt
│ ├── IMG_0080.JPG
│ ├── ...
│ ├── train_det_gt.txt
├── train
│ ├── IMG_0030.gt
│ ├── IMG_0030.JPG
│ ├── IMG_0063.gt
│ ├── IMG_0063.JPG
│ ├── ...
│ ├── test_det_gt.txt
```

### 3.3 配置说明

在配置文件`configs/det/dbnet/db_r50_icdar15.yaml`中更新如下文件路径。其中`dataset_root`会分别和`dataset_root`以及`label_file`拼接构成完整的数据集目录和标签文件路径。
Expand Down
11 changes: 6 additions & 5 deletions configs/det/dbnet/db_r50_synthtext.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
system:
mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore
distribute: True
distribute: False
amp_level: 'O0'
seed: 42
log_interval: 100
Expand Down Expand Up @@ -31,9 +31,10 @@ loss:
bce_replace: bceloss

scheduler:
scheduler: constant
lr: 1.0e-4
scheduler: polynomial_decay
lr: 0.007
num_epochs: 2
decay_rate: 0.9
warmup_epochs: 0

optimizer:
Expand Down Expand Up @@ -73,7 +74,7 @@ train:
- RandomCropWithBBox:
max_tries: 10
min_crop_ratio: 0.1
crop_size: [ 512, 512 ] # following 'Synthetic Data for Text Localisation in Natural Images'
crop_size: [ 640, 640 ]
p: 1.0
- ValidatePolygons:
- ShrinkBinaryMap:
Expand All @@ -97,6 +98,6 @@ train:

loader:
shuffle: True
batch_size: 20
batch_size: 16
drop_remainder: True
num_workers: 8
156 changes: 156 additions & 0 deletions configs/det/dbnet/db_r50_td500.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
system:
mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore
distribute: False
amp_level: 'O0'
seed: 42
log_interval: 10
val_while_train: True
drop_overflow_update: False
val_interval: 5

model:
type: det
transform: null
backbone:
name: det_resnet50
pretrained: True
neck:
name: DBFPN
out_channels: 256
bias: False
head:
name: DBHead
k: 50
bias: False
adaptive: True
pretrained: https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_synthtext-40655acb.ckpt

postprocess:
name: DBPostprocess
output_polygon: False # whether to output a polygon or a box
binary_thresh: 0.3 # binarization threshold
box_thresh: 0.6 # box score threshold
max_candidates: 1000
expand_ratio: 1.5 # coefficient for expanding predictions

metric:
name: DetMetric
main_indicator: f-score

loss:
name: L1BalancedCELoss
eps: 1.0e-6
l1_scale: 10
bce_scale: 5
bce_replace: bceloss

scheduler:
scheduler: polynomial_decay
lr: 0.007
num_epochs: 1200
decay_rate: 0.9
warmup_epochs: 3

optimizer:
opt: SGD
filter_bias_and_bn: false
momentum: 0.9
weight_decay: 1.0e-4

# only used for mixed precision training
loss_scaler:
type: dynamic
loss_scale: 512
scale_factor: 2
scale_window: 1000

train:
ckpt_save_dir: './tmp_det'
dataset_sink_mode: True
dataset:
type: DetDataset
dataset_root: /data/ocr_datasets
data_dir: TD500_TR400/data
label_file: TD500_TR400/data/train_gt_all_labels.txt
sample_ratio: 1.0
transform_pipeline:
- DecodeImage:
img_mode: RGB
to_float32: False
- DetLabelEncode:
- RandomColorAdjust:
brightness: 0.1255 # 32.0 / 255
saturation: 0.5
- IaaAugment:
Fliplr: { p: 0.5 }
Affine: { rotate: [ -10, 10 ], p: 1.0 }
- RandomScale:
scale_range: [ 0.5, 3.0 ]
p: 1.0
- RandomCropWithBBox:
max_tries: 10
min_crop_ratio: 0.1
crop_size: [ 640, 640 ]
p: 1.0
- ValidatePolygons:
- ShrinkBinaryMap:
min_text_size: 8
shrink_ratio: 0.4
- BorderMap:
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
- NormalizeImage:
bgr_to_rgb: False
is_hwc: True
mean: imagenet
std: imagenet
- ToCHWImage:
# the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visualize
output_columns: [ 'image', 'binary_map', 'mask', 'thresh_map', 'thresh_mask' ] #'img_path']
# output_columns: ['image'] # for debug op performance
net_input_column_index: [0] # input indices for network forward func in output_columns
label_column_index: [1, 2, 3, 4] # input indices marked as label

loader:
shuffle: True
batch_size: 20
drop_remainder: True
num_workers: 16

eval:
ckpt_load_path: tmp_det/best.ckpt
dataset_sink_mode: False
dataset:
type: DetDataset
dataset_root: /data/ocr_datasets
data_dir: TD500_TR400/data
label_file: TD500_TR400/data/test_gt_labels.txt
sample_ratio: 1.0
transform_pipeline:
- DecodeImage:
img_mode: RGB
to_float32: False
- DetLabelEncode:
- GridResize:
factor: 32
# GridResize already sets the evaluation size to [ 736, 1280 ].
# Uncomment ScalePadImage block for other resolutions.
- ScalePadImage:
target_size: [ 736, 736 ] # h, w
- NormalizeImage:
bgr_to_rgb: False
is_hwc: True
mean: imagenet
std: imagenet
- ToCHWImage:
# the order of the dataloader list, matching the network input and the labels for evaluation
output_columns: [ 'image', 'polys', 'ignore_tags' ]
net_input_column_index: [0] # input indices for network forward func in output_columns
label_column_index: [1, 2] # input indices marked as label

loader:
shuffle: False
batch_size: 1 # TODO: due to dynamic shape of polygons (num of boxes varies), BS has to be 1
drop_remainder: False
num_workers: 2