-
Notifications
You must be signed in to change notification settings - Fork 60
Adding end-to-end prediction #772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tools/infer/text/README_CN.md
Outdated
``` | ||
>注意:如果要可视化版面分析、表格识别和文字识别的结果,请设置`--visualize_output True`。 | ||
|
||
运行后,推理结果保存在`{args.draw_img_save_dir}/system_results.txt`中,其中`--draw_img_save_dir`是保存结果的目录,这是`./inference_results`的默认设置。下面是一些结果的例子。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
推理结果不是在system_results.txt,而是在xxx_e2e_result.txt里面。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已更改
tools/infer/text/predict_e2e.py
Outdated
|
||
def main(): | ||
# from mindocr.utils.logger import set_logger | ||
# set_logger(name="mindocr") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
日志模块不要注释掉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已取消注释
tools/infer/text/predict_e2e.py
Outdated
if layout_analyzer is not None: | ||
cropped_img = add_padding(cropped_img, padding_size=10, padding_color=(255, 255, 255)) | ||
|
||
rec_res_all_crops = text_system(cropped_img, do_visualize=do_visualize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里这么写有两个问题:
1、因为layout_analyzer处理得到的result是一个版面图的不同部分分解成的小模块,然后遍历小模块,去解析文字。如果do_visualize=True,那么text_system就会只保存这个小模块的文本识别结果为图片,不合理;
2、因为这里传到text_system的cropped_img是经过cv2.imread处理过的Tensor,根据predict_system.py line68,会把保存的图片名变为img_res.png。这样如果进行多个图片的在线推理时,会导致最后只保存了一个img_res.png。
建议修改方案:
在line173这里,改成
rec_res_all_crops = text_system(cropped_img, do_visualize=False)
无论用户输入的do_visualize是什么,这里固定只处理单个模块的text用于后续拼接输出docx,不做可视化。
在line147这里,加上
if text_system is not None and do_visualize:
text_system(img_path, do_visualize)
用于在用户指定do_visualize=True时,输出整体图片的文字识别可视化结果。
当然这是个建议参考,不是最优方案。如果有更好的方案,建议用更好的方案修改~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已更改
tools/infer/text/README_CN.md
Outdated
要对输入图像或目录中的多个图像运行文档分析(即检测所有文本区域、表格区域、图像区域,并对这些区域进行文字识别,最终将结果按照图像原来的排版方式转换成Docx文件),请运行: | ||
|
||
```shell | ||
python tools/infer/text/predict_e2e.py --image_dir {path_to_img or dir_to_imgs} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
名字改下,叫predict_table_e2e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已更改(备注:涉及到table模型的统一加table进行标注区分)
|
||
# crop text regions | ||
h_ori, w_ori = image.shape[:2] | ||
category_dict = {1: "text", 2: "title", 3: "list", 4: "table", 5: "figure"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allow config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已在config 增加配置layout_category_dict_path。默认配置为mindocr/utils/dict/layout_category_dict.txt(新增)
This reverts commit 4605f72.
Thank you for your contribution to the MindOCR repo.
Before submitting this PR, please make sure:
Motivation
This PR mainly adding an end-to-end prediction script for users to recognize a list of images into documents.
Test Plan
Running the following command and check the output docx files under
./inferrence_results