Skip to content

Commit bee28c7

Browse files
committed
Merge branch 'main' into main
2 parents 53fb2b7 + d1c0db0 commit bee28c7

File tree

21 files changed

+102
-204
lines changed

21 files changed

+102
-204
lines changed

CONTRIBUTING_CN.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
在您首次向 MindOCR 社区提交代码之前,需要签署 CLA。
88

9-
对于个人贡献者,请参阅[ICLA在线文档]https://www.mindspore.cn/icla以获取详细信息。
9+
对于个人贡献者,请参阅[ICLA在线文档](https://www.mindspore.cn/icla)以获取详细信息。
1010

1111
## 贡献类型
1212

@@ -46,7 +46,7 @@ MindOCR总是可以使用更多的文档,无论是作为官方MindOCR文档的
4646

4747
准备好做出贡献了吗?以下是为本地开发设置“mindocr”的方法。
4848

49-
1.[GitHub]https://github.com/mindspore-lab/mindocr上fork 'mindocr' 仓库。
49+
1.[GitHub](https://github.com/mindspore-lab/mindocr)上fork 'mindocr' 仓库。
5050
2. 在本地克隆你的 fork:
5151

5252
```shell
@@ -85,11 +85,11 @@ MindOCR总是可以使用更多的文档,无论是作为官方MindOCR文档的
8585

8686
如果所有静态测试都通过了,您将得到如下输出:
8787

88-
![提交成功前]https://user-images.githubusercontent.com/74176172/221346245-ea868015-bb09-4e53-aa56-73b015e1e336.png
88+
![提交成功前](https://user-images.githubusercontent.com/74176172/221346245-ea868015-bb09-4e53-aa56-73b015e1e336.png)
8989

9090
否则,您需要根据输出修复警告:
9191

92-
![提交前失败]https://user-images.githubusercontent.com/74176172/221346251-7d8f531f-9094-474b-97f0-fd5a55e6d3de.png
92+
![提交前失败](https://user-images.githubusercontent.com/74176172/221346251-7d8f531f-9094-474b-97f0-fd5a55e6d3de.png)
9393

9494
要获取 pre-commit 和 pytest,只需将它们 pip 安装到您的 conda 环境中。
9595

configs/det/fcenet/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ FCENet is a segmentation-based text detection algorithm. In the text detection s
1616

1717
The idea of deformable convolution is very simple, which is to change the fixed shape of the convolution kernel into a variable one. Based on the position of the original convolution, deformable convolution will generate a random position shift, as shown in the following figure:
1818

19-
<p align="center"><img alt="Figure 1" src="https://github.com/colawyee/mindocr-1/assets/15730439/5dfdbabd-a025-4789-89fb-4f2263e9deff" width="600"/></p>
19+
<p align="center"><img alt="Figure 1" src="https://github.com/user-attachments/assets/858357ca-02ff-46b4-8d4d-4c2e53f00ac5" width="600"/></p>
2020
<p align="center"><em>Figure 1. Deformable Convolution</em></p>
2121

2222
Figure (a) is the original convolutional kernel, Figure (b) is a deformable convolutional kernel that generates random directional position shifts, and Figure (c) and (d) are two special cases of Figure (b). It can be seen that the advantage of this is that it can improve the Geometric transformation ability of the convolution kernel, so that it is not limited to the shape of the original convolution kernel rectangle, but can support more abundant irregular shapes. Deformable convolution performs better in extracting irregular shape features [[1](#references)] and is more suitable for text recognition scenarios in natural scenes.
@@ -25,7 +25,7 @@ Figure (a) is the original convolutional kernel, Figure (b) is a deformable conv
2525

2626
Fourier contour is a curve fitting method based on Fourier transform. As the number of Fourier degree k increases, more high-frequency signals will be introduced, and the contour description will be more accurate. The following figure shows the ability to describe irregular curves under different Fourier degree:
2727

28-
<p align="center"><img width="445" alt="Image" src="https://github.com/colawyee/mindocr-1/assets/15730439/23583dd8-0c67-4774-a4f3-9f5971e9ed93"></p>
28+
<p align="center"><img width="445" alt="Image" src="https://github.com/user-attachments/assets/33f2f7f3-91d6-4e6a-99ee-5930f36d013c"></p>
2929
<p align="center"><em>Figure 2. Fourier contour fitting with progressive approximation</em></p>
3030

3131
It can be seen that as the Fourier degree k increases, the curves it can depict can become very complicated.
@@ -36,7 +36,7 @@ Fourier Contour Encoding is a method proposed in the paper "Fourier Contour Embe
3636

3737
#### The FCENet Framework
3838

39-
<p align="center"><img width="800" alt="Image" src="https://github.com/colawyee/mindocr-1/assets/15730439/cfe9f5b1-d22f-4d01-8f27-0856a930f78b"></p>
39+
<p align="center"><img width="800" alt="Image" src="https://github.com/user-attachments/assets/4cfbda01-84c6-43b1-8a60-4a2b20870c2a"></p>
4040
<p align="center"><em>Figure 3. FCENet framework</em></p>
4141

4242
Like most OCR algorithms, the structure of FCENet can be roughly divided into three parts: backbone, neck, and head. The backbone uses a deformable convolutional version of Resnet50 for feature extraction; The neck section adopts a feature pyramid [[2](#references)], which is a set of convolutional kernels of different sizes, suitable for extracting features of different sizes from the original image, thereby improving the accuracy of object detection. It suits scenes that there are a few text boxes of different sizes in one image; The head part has two branches, one is the classification branch. The classification branch predicts the heat maps of both text regions and text center regions, which are pixel-wise multiplied, resulting in the the classification score map. The loss of classification branch is calculated by the cross entropy between prediction heat maps and ground truth. The regression branch predicts the Fourier signature vectors, which are used to reconstruct text contours via the Inverse Fourier transformation (IFT). Calculate the smooth-l1 loss of the reconstructed text contour and the ground truth contour in the image space as the loss value of the regression branch.

configs/det/fcenet/README_CN.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,8 @@ FCENet的一大亮点就是在任意不规则形状的文本场景上表现优
1818

1919
可变形卷积的思想非常简单,就是将原来固定形状的卷积核变成可变的,在原始卷积的位置基础上,可变形卷积会产生一个随机方向的位置偏移,如下图所示:
2020

21-
<p align="center"><img alt="Figure 1" src="https://github.com/colawyee/mindocr-1/assets/15730439/5dfdbabd-a025-4789-89fb-4f2263e9deff" width="600"/></p>
21+
<p align="center"><img alt="Figure 1" src="https://github.com/user-attachments/assets/858357ca-02ff-46b4-8d4d-4c2e53f00ac5" width="600"/></p>
22+
2223
<p align="center"><em>图 1. 可变形卷积</em></p>
2324

2425
图(a)是原始的卷积核,图(b)是产生了随机方向位置偏移的可变形卷积核,图(c)(d)是图(b)的两种特殊情况。可以看出,这样做的好处是可以提升卷积核的几何变换能力,使其不仅局限于原始卷积核矩形的形状,而是可以支持更丰富的不规则形状。可变形卷积对不规则形状特征提取的效果会更好[[1](#参考文献)],也更加适用于自然场景的文本识别场景。
@@ -27,7 +28,8 @@ FCENet的一大亮点就是在任意不规则形状的文本场景上表现优
2728

2829
傅里叶轮廓线是基于傅里叶变换的一种曲线拟合方法,随着傅里叶级数的项数k越大,就引入更多的高频信号,对轮廓刻画就越准确。下图展示了不同傅里叶级数情况下对不规则曲线的刻画能力:
2930

30-
<p align="center"><img width="445" alt="Image" src="https://github.com/colawyee/mindocr-1/assets/15730439/23583dd8-0c67-4774-a4f3-9f5971e9ed93"></p>
31+
<p align="center"><img width="445" alt="Image" src="https://github.com/user-attachments/assets/33f2f7f3-91d6-4e6a-99ee-5930f36d013c"></p>
32+
3133
<p align="center"><em>图 2. 傅里叶轮廓线渐进估计效果</em></p>
3234

3335
可以看出,随着傅里叶级数的项数k越大,其可以刻画的曲线是可以变得非常精细的。
@@ -38,7 +40,8 @@ FCENet的一大亮点就是在任意不规则形状的文本场景上表现优
3840

3941
#### FCENet算法框架
4042

41-
<p align="center"><img width="800" alt="Image" src="https://github.com/colawyee/mindocr-1/assets/15730439/cfe9f5b1-d22f-4d01-8f27-0856a930f78b"></p>
43+
<p align="center"><img width="800" alt="Image" src="https://github.com/user-attachments/assets/4cfbda01-84c6-43b1-8a60-4a2b20870c2a"></p>
44+
4245
<p align="center"><em>图 3. FCENet算法框架图</em></p>
4346

4447
像大多数OCR算法一样,FCENet的网络结构大体可以分为backbone,neck,head三个部分。其中backbone采用可变形卷积版本的Resnet50用于提取特征;neck部分采用特征金字塔[[2](#参考文献)],特征金字塔是一组不同大小的卷积核,适用于提取原图中不同大小的特征,从而提高了目标检测的准确率,在一张图片中有不同大小的文本框的场景效果比较好;head部分有两条分支,一条是分类分支,用于预测文本区域和文本中心区域的热力图,通过比较该热力图与监督信号的交叉熵作为分类分支的损失值,另一条是回归分支,回归分支预测傅立叶特征向量,该向量用于通过傅立叶逆变换重构文本轮廓,通过计算重构文本轮廓线和监督信号的轮廓线在图像空间的smooth-l1 loss作为回归分支的损失值。

configs/kie/layoutlmv3/README_CN.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -224,15 +224,15 @@ python tools/infer/text/predict_ser.py --rec_algorithm CRNN_CH --image_dir {dir
224224
以中文表单的实体识别为例,使用脚本识别`configs/kie/vi_layoutxlm/example.jpg`表单中的实体,结果将默认存放在`./inference_results`文件夹内,也可以通过`--draw_img_save_dir`命令行参数自定义结果存储路径。
225225

226226
<p align="center">
227-
<img src="example.jpg" width=1000 />
227+
<img src="../vi_layoutxlm/example.jpg" width=1000 />
228228
</p>
229229
<p align="center">
230230
<em> example.jpg </em>
231231
</p>
232232
识别结果如图,图片保存为`inference_results/example_ser.jpg`
233233

234234
<p align="center">
235-
<img src="example_ser.jpg" width=1000 />
235+
<img src="../vi_layoutxlm/example_ser.jpg" width=1000 />
236236
</p>
237237
<p align="center">
238238
<em> example_ser.jpg </em>

configs/layout/layoutlmv3/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ python tools/param_converter_from_torch.py \
7272
### 2.3 Model Evaluation
7373

7474
```bash
75-
python tools/eval.py --config configs/layout/layoutlmv3/layoutlmv3_publaybet.yaml
75+
python tools/eval.py --config configs/layout/layoutlmv3/layoutlmv3_publaynet.yaml
7676
```
7777
The evaluation results on the public benchmark dataset (PublayNet) are as follows:
7878

configs/layout/layoutlmv3/README_CN.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ python tools/param_converter_from_torch.py \
7676
### 2.3 模型评估
7777

7878
```bash
79-
python tools/eval.py --config configs/layout/layoutlmv3/layoutlmv3_publaybet.yaml
79+
python tools/eval.py --config configs/layout/layoutlmv3/layoutlmv3_publaynet.yaml
8080
```
8181
在公开基准数据集(PublayNet)上的-评估结果如下:
8282

configs/rec/abinet/README.md

Lines changed: 1 addition & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -208,7 +208,7 @@ eval:
208208
```
209209

210210
**Notes:**
211-
- As the global batch size (batch_size x num_devices) is important for reproducing the result, please adjust `batch_size` accordingly to keep the global batch size unchanged for a different number of GPUs/NPUs, or adjust the learning rate linearly to a new global batch size.
211+
- As the global batch size (batch_size x num_devices) is important for reproducing the result, please adjust `batch_size` accordingly to keep the global batch size unchanged for a different number of NPUs, or adjust the learning rate linearly to a new global batch size.
212212
- Dataset: The MJSynth and SynthText datasets come from [ABINet_repo](https://github.com/FangShancheng/ABINet).
213213

214214

@@ -245,29 +245,7 @@ To evaluate the accuracy of the trained model, you can use `eval.py`. Please set
245245
python tools/eval.py --config configs/rec/abinet/abinet_resnet45_en.yaml
246246
```
247247

248-
**Notes:**
249-
- Context for val_while_train: Since mindspore.nn.transformer requires a fixed batchsize when defined, when choosing val_while_train=True, it is necessary to ensure that the batchsize of the validation set is the same as that of the model.
250-
- So, line 179-185 in minocr.data.builder.py
251-
```
252-
if not is_train:
253-
if drop_remainder and is_main_device:
254-
_logger.warning(
255-
"`drop_remainder` is forced to be False for evaluation "
256-
"to include the last batch for accurate evaluation."
257-
)
258-
drop_remainder = False
259248

260-
```
261-
should be changed to
262-
```
263-
if not is_train:
264-
# if drop_remainder and is_main_device:
265-
_logger.warning(
266-
"`drop_remainder` is forced to be False for evaluation "
267-
"to include the last batch for accurate evaluation."
268-
)
269-
drop_remainder = True
270-
```
271249
## Results
272250
<!--- Guideline:
273251
Table Format:

configs/rec/abinet/README_CN.md

Lines changed: 6 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ eval:
148148
# label_file: # 验证或评估数据集的标签文件路径,将与`dataset_root`拼接形成完整的验证或评估数据的标签文件路径。当数据集为LMDB格式时无需配置
149149
...
150150
```
151-
通过使用上述配置 yaml 运行 [模型评估](#33-model-evaluation) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。
151+
通过使用上述配置 yaml 运行 [模型评估](#33-模型评估) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。
152152

153153
2.对同一文件夹下的多个数据集进行评估
154154

@@ -243,6 +243,7 @@ mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/abine
243243
```
244244
ABINet模型训练时需要加载预训练模型,预训练模型的权重来自[abinet_pretrain_en.ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_pretrain_en-821ca20b.ckpt),需要在“configs/rec/abinet/abinet_resnet45_en.yaml”中model的pretrained添加预训练权重的路径。
245245

246+
246247
* 单卡训练
247248

248249
如果要在没有分布式训练的情况下在较小的数据集上训练或微调模型,请将配置参数`distribute`修改为False 并运行:
@@ -264,29 +265,7 @@ python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml
264265
python tools/eval.py --config configs/rec/abinet/abinet_resnet45_en.yaml
265266
```
266267

267-
**注意:**
268-
- 由于mindspore.nn.transformer在定义时需要固定的批处理大小,因此在选择val_while_train=True时,有必要确保验证集的批处理大小与模型的批处理大小相同。
269-
- 所以, minocr.data.builder.py中的第179-185行
270-
```
271-
if not is_train:
272-
if drop_remainder and is_main_device:
273-
_logger.warning(
274-
"`drop_remainder` is forced to be False for evaluation "
275-
"to include the last batch for accurate evaluation."
276-
)
277-
drop_remainder = False
278268

279-
```
280-
应该被改为
281-
```
282-
if not is_train:
283-
# if drop_remainder and is_main_device:
284-
_logger.warning(
285-
"`drop_remainder` is forced to be False for evaluation "
286-
"to include the last batch for accurate evaluation."
287-
)
288-
drop_remainder = True
289-
```
290269

291270
## 评估结果
292271
<!--- Guideline:
@@ -304,9 +283,9 @@ Table Format:
304283

305284
<div align="center">
306285

307-
| **f** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** |
308-
|:------:| :----------: |:-----------------:| :-----------: |:---------:| :------------: |:-------------:|:-----------------:|:-----------:|:---------:| :----------: |:--------------------------------:|:------------------------------------------------------------------------------------------------:|
309-
| ABINet | Resnet45 | MJ+ST | 36.93 | 8 | 96 | O2 | 680.51 s | 115.56 | 6646.07 | 91.35% | [yaml](abinet_resnet45_en.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_resnet45_en-7efa1184.ckpt) |
286+
| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** |
287+
|:--------------:| :----------: |:-----------------:| :-----------: |:---------:| :------------: |:-------------:|:-----------------:|:-----------:|:---------:| :----------: |:--------------------------------:|:------------------------------------------------------------------------------------------------:|
288+
| ABINet | Resnet45 | MJ+ST | 36.93 | 8 | 96 | O2 | 680.51 s | 115.56 | 6646.07 | 91.35% | [yaml](abinet_resnet45_en.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_resnet45_en-7efa1184.ckpt) |
310289

311290
</div>
312291

@@ -323,6 +302,7 @@ Table Format:
323302
</details>
324303

325304

305+
326306
## 参考文献
327307
<!--- Guideline: Citation format GB/T 7714 is suggested. -->
328308

configs/rec/crnn/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@ eval:
159159
...
160160
```
161161

162-
By running `tools/eval.py` as noted in section [Model Evaluation](#33-model-evaluation) with the above config yaml, you can get the accuracy performance on dataset CUTE80.
162+
By running `tools/eval.py` as noted in section [Model Evaluation](#model-evaluation) with the above config yaml, you can get the accuracy performance on dataset CUTE80.
163163

164164

165165
2. Evaluate on multiple datasets under the same folder

configs/rec/crnn/README_CN.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@ eval:
159159
...
160160
```
161161

162-
通过使用上述配置 yaml 运行 [模型评估](#33-model-evaluation) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。
162+
通过使用上述配置 yaml 运行 [模型评估](#模型评估) 部分中所述的`tools/eval.py`,您可以获得数据集 CUTE80 的准确度性能。
163163

164164

165165
2. 对同一文件夹下的多个数据集进行评估
@@ -319,6 +319,7 @@ Mindocr内置了一部分字典,均放在了 `mindocr/utils/dict/` 位置,
319319

320320
详细的数据准备和config文件配置方式, 请参考 [中文识别数据集准备](../../../docs/zh/datasets/chinese_text_recognition.md)
321321

322+
322323
## 性能表现
323324

324325
### 通用泛化中文模型

0 commit comments

Comments
 (0)