Adding end-to-end prediction #772

can-gaa-hou · 2024-11-18T11:05:44Z

Thank you for your contribution to the MindOCR repo.
Before submitting this PR, please make sure:

You have read the Contributing Guidelines on pull requests
Your code builds clean without any errors or warnings
You are using approved terminology
You have added unit tests

Motivation

This PR mainly adding an end-to-end prediction script for users to recognize a list of images into documents.

Test Plan

Running the following command and check the output docx files under ./inferrence_results

python tools/infer/text/predict_e2e.py  --image_dir ./configs/layout/yolov8/images/example.jpg --det_algorithm DB++ --rec_algorithm SVTR_PPOCRv3_CH

alien-0119 · 2024-11-20T02:35:58Z

tools/infer/text/README_CN.md

+```
+>注意：如果要可视化版面分析、表格识别和文字识别的结果，请设置`--visualize_output True`。
+
+运行后，推理结果保存在`{args.draw_img_save_dir}/system_results.txt`中，其中`--draw_img_save_dir`是保存结果的目录，这是`./inference_results`的默认设置。下面是一些结果的例子。


推理结果不是在system_results.txt，而是在xxx_e2e_result.txt里面。

alien-0119 · 2024-11-20T02:38:42Z

tools/infer/text/predict_e2e.py

+
+def main():
+    # from mindocr.utils.logger import set_logger
+    # set_logger(name="mindocr")


日志模块不要注释掉

已取消注释

alien-0119 · 2024-11-20T03:45:12Z

tools/infer/text/predict_e2e.py

+            if layout_analyzer is not None:
+                cropped_img = add_padding(cropped_img, padding_size=10, padding_color=(255, 255, 255))
+
+            rec_res_all_crops = text_system(cropped_img, do_visualize=do_visualize)


这里这么写有两个问题：
1、因为layout_analyzer处理得到的result是一个版面图的不同部分分解成的小模块，然后遍历小模块，去解析文字。如果do_visualize=True，那么text_system就会只保存这个小模块的文本识别结果为图片，不合理；
2、因为这里传到text_system的cropped_img是经过cv2.imread处理过的Tensor，根据predict_system.py line68，会把保存的图片名变为img_res.png。这样如果进行多个图片的在线推理时，会导致最后只保存了一个img_res.png。

建议修改方案：
在line173这里，改成
rec_res_all_crops = text_system(cropped_img, do_visualize=False)
无论用户输入的do_visualize是什么，这里固定只处理单个模块的text用于后续拼接输出docx，不做可视化。

在line147这里，加上

if text_system is not None and do_visualize: text_system(img_path, do_visualize)

用于在用户指定do_visualize=True时，输出整体图片的文字识别可视化结果。

当然这是个建议参考，不是最优方案。如果有更好的方案，建议用更好的方案修改~

CaitinZhao · 2024-11-21T02:05:27Z

tools/infer/text/README_CN.md

+要对输入图像或目录中的多个图像运行文档分析（即检测所有文本区域、表格区域、图像区域，并对这些区域进行文字识别，最终将结果按照图像原来的排版方式转换成Docx文件），请运行：
+
+```shell
+python tools/infer/text/predict_e2e.py --image_dir {path_to_img or dir_to_imgs} \


名字改下，叫predict_table_e2e

已更改（备注：涉及到table模型的统一加table进行标注区分）

SamitHuang · 2024-11-22T03:32:03Z

tools/infer/text/predict_table_e2e.py

+
+    # crop text regions
+    h_ori, w_ori = image.shape[:2]
+    category_dict = {1: "text", 2: "title", 3: "list", 4: "table", 5: "figure"}


allow config?

已在config 增加配置layout_category_dict_path。默认配置为mindocr/utils/dict/layout_category_dict.txt(新增)

This reverts commit 4605f72.

Add restoration method after layout, ocr inference...

851d1f7

alien-0119 reviewed Nov 20, 2024

View reviewed changes

CaitinZhao reviewed Nov 21, 2024

View reviewed changes

Add predict_table_e2e, including layout, table, ocr, and recovery

63d0669

SamitHuang reviewed Nov 22, 2024

View reviewed changes

add layout_category_dict_path to config

3262908

SamitHuang approved these changes Nov 22, 2024

View reviewed changes

CaitinZhao approved these changes Nov 22, 2024

View reviewed changes

CaitinZhao merged commit 4605f72 into mindspore-lab:main Nov 22, 2024
2 checks passed

CaitinZhao added a commit that referenced this pull request Nov 22, 2024

Revert "Adding end-to-end prediction (#772)"

b996207

This reverts commit 4605f72.

CaitinZhao mentioned this pull request Nov 22, 2024

Revert "Adding end-to-end prediction" #774

Merged

Mark-ZhouWX pushed a commit that referenced this pull request Nov 22, 2024

Revert "Adding end-to-end prediction (#772)" (#774)

46f656d

This reverts commit 4605f72.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding end-to-end prediction #772

Adding end-to-end prediction #772

Uh oh!

can-gaa-hou commented Nov 18, 2024 •

edited

Loading

Uh oh!

alien-0119 Nov 20, 2024

Uh oh!

hongziqi Nov 20, 2024

Uh oh!

alien-0119 Nov 20, 2024

Uh oh!

hongziqi Nov 20, 2024

Uh oh!

alien-0119 Nov 20, 2024

Uh oh!

hongziqi Nov 20, 2024

Uh oh!

CaitinZhao Nov 21, 2024

Uh oh!

hongziqi Nov 21, 2024

Uh oh!

SamitHuang Nov 22, 2024

Uh oh!

hongziqi Nov 22, 2024

Uh oh!

Uh oh!

Uh oh!

Adding end-to-end prediction #772

Adding end-to-end prediction #772

Uh oh!

Conversation

can-gaa-hou commented Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Test Plan

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

can-gaa-hou commented Nov 18, 2024 •

edited

Loading