Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.在ppstructure管道中添加latex_ocr公式识别功能;2.添加pdf转markdown文件功能 #13868

Merged
merged 5 commits into from
Sep 29, 2024

Conversation

ztyf-lq
Copy link
Contributor

@ztyf-lq ztyf-lq commented Sep 13, 2024

前言

尊敬的 ppocr 官方人员您好,我是一名 ppocr 项目的使用者,在日常工作学习中我都会用到ppocr,我深感 ppocr 的强大之处!同时能为 ppocr 做贡献也是我非常想要做的事情。非常期待您在百忙之中看看我写的代码是否是 ppocr 所需要的。

改动如下:

  1. ppstructure 管道中添加 latex_ocr 公式识别功能;
    a. 修改 ppstructure/predict_system.py 文件中 StructureSystem 类,添加 latex_ocr 模型和布局为公式的区域处理;
    b. 由于 docx 中不支持插入 latex 公式,在 ppstructure/recovery/recovery_to_doc.py 文件中 convert_info_docx 函数中跳过latex公式;
    c. 在 ppstructure/utility.pydraw_structure_result 函数中可视化 ocr 结果中跳过 latex 公式;
  2. 添加 pdf 转 markdown 文件功能
    a. 在目录 ppstructure/recovery 下添加文件 recovery_to_markdown.py,其中程序功能为转换ppstructure识别结果为markdown文件。其中对于文本区域处理目前给出了两种处理方法,第一种为每一个自然段分割标志位开头两个空格,第二种为每个自然段开头没有空格,这种情况下以每个自然段最后一行一般不会是“满行”,而是会留有空余空间;
    b. ppstructure/predict_system.py 文件中调用转换 ppstructure 识别结果到 markdown 文件的函数;
  3. 添加必要的命令行参数选项;
    a. 添加 latex_ocr 公式识别模型必要的参数;
    b. 添加 recovery_to_markdown 选项达到开启/关闭转换 ppstructure 识别结果到 markdown 文件;
    c. 添加 formula 选项达到开启/关闭latex公式识别;

如果我的代码恰巧是 ppocr 所需要的,后续我会跟进官方人员的建议并且在版面恢复文档中添加 pdf 转 markdown 文件的教程。

@CLAassistant
Copy link

CLAassistant commented Sep 13, 2024

CLA assistant check
All committers have signed the CLA.

@GreatV
Copy link
Collaborator

GreatV commented Sep 13, 2024

感谢大佬的贡献

@GreatV
Copy link
Collaborator

GreatV commented Sep 13, 2024

@liuhongen1234567 大佬,麻烦review一下这个PR。

@ztyf-lq
Copy link
Contributor Author

ztyf-lq commented Sep 20, 2024

您好,我后续有更新文档的打算,最近可能使用ppocr复现其他的项目,更新文档的时间最晚会在十月。

@@ -78,6 +80,13 @@ def __init__(self, args):
)
else:
self.table_system = TableSystem(args)
if args.formula:
args_fomula = deepcopy(args)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a typo in args_fomula which should be args_formula

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will modify it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had modified it.

@jzhang533
Copy link
Collaborator

you may need to sign the updated CLA, and I think we can leave the documentation as future work.

@ztyf-lq
Copy link
Contributor Author

ztyf-lq commented Sep 27, 2024

I had updated the CLA

@GreatV
Copy link
Collaborator

GreatV commented Sep 27, 2024

这个功能该怎么使用呢,麻烦给一个示例,我们验证一下。

@ztyf-lq
Copy link
Contributor Author

ztyf-lq commented Sep 27, 2024

运行示例:
cd ppstructure && python predict_system.py --image_dir=data/math.pdf --det_model_dir=models/ch_PP-OCRv4_det_infer --rec_model_dir=models/ch_PP-OCRv4_rec_infer --table_model_dir=models/ch_ppstructure_mobile_v2.0_SLANet_infer --formula_model_dir=models/rec_latex_ocr_infer --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt --layout_model_dir=models/picodet_lcnet_x1_0_fgd_layout_cdla_infer --layout_dict_path=../ppocr/utils/dict/layout_dict/layout_cdla_dict.txt --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt --vis_font_path=../doc/fonts/simfang.ttf --formula=True --recovery=True --recovery_to_markdown=True --output=../output/
其中--formula选项决定是否对布局检测出的公式区域进行识别;--recovery_to_markdown决定是否将识别结果转换成markdown文件,--recovery_to_markdown只有在--recovery=True的时候才会起作用;公式识别只有layout_cdla布局模型下才能work。

我的一个转换示例如下:
QQ20240916-230658
QQ截图20240916230245

@GreatV
Copy link
Collaborator

GreatV commented Sep 27, 2024

看起来不错,非常好的工作。就是双栏处理看起来似乎还有点问题。

show_1
show_2

where u E Rn is the input signal, E Rh is the internal state,and y E Rm is the output. Here, we are letting n > 1, m > 1,which yields a multiple-input, multiple-output (MIMO) state-space model. For the remainder of this paper, we will ignorethe Du term as we do not use it.

The state-space model in its original form describes acontinuous-time system, but in the field of digital signalprocessing, there are standard recipes for discretizing sucha system into a discrete-time state-space model. One suchmethod that we use in this work is the zero-order hold (ZOH),which gives us the discrete-time state-space matrices A andB as follows:

$$\overline{{{A}}}=\exp(\Delta A)$$

$$\overline{{B}}=(\Delta A)^{-1}\cdot(\exp(\Delta A)-1)\cdot\Delta B.;;(2)$$

The discrete state-space model is then given byc[t+1]=Ax[t]+Bu[t],

y[t]=Cx[t]

We use an hourglass network with long-range skip connec-tions, similar in form to the Sashimi network [24] for audiogeneration. However, unlike previous works using state-spacemodels for audio processing [24], [29], [30], our networkdirectly takes in raw audio waveforms in the -1 to +1 rangeand outputs raw waveforms as well, with no one-hot encodingor spectral processing (e.g. STFT or iSTFT). Furthermore,we retain causality as much as possible for sake of real-timeinference, meaning that we eschew any form of bidirectionalstate-space layers. See Fig. 1 for a schematic drawing.

As with typical auto-encoder networks, the audio featuresare down-sampled in the encoder and then up-sampled inthe decoder. For the re-sampling operation, we use a simpleIThis is a prior not required, as technically we can configure our networkas complex-valued to handle complex features.However, we do not explorethis configuration in this work.

2The size of the internal state h,can be interpreted as the degree ofparametrization of a basis temporal kernel, or some implicit (dilated)“kernelsize" in the frequency domain. We explore this in a future work.

(3)

In the context of recurrent neural networks (RNNs),this isessentially a linear RNN layer, which allows for efficientonline inference and generation (in our case real-time speechenhancement), but at the same time efficient parallelizationduring training.

It is straightforward to check that the discrete-time impulseresponse is given as

$$k[\tau]=C,\overline{{{A}}}^{\prime},\overline{{{B}}}.$$

(4)

where T denotes the kernel timestep. During training, k canbe considered the “full” long 1D convolutional kernel withshape (output channels, input channels, length), in the sensethat the output y can be computed via the long convolutionyj = > u *kij. By the convolution theorem, we can performthis operation in the frequency domain, which becomes apoint-wise product gjf = >, ukijf. The hat symbol denotesthe Fourier transform of the signal (with the index f denoting

Fig. 1. A schematic drawing of the network architecture, with only 2 encoderand decoder blocks shown for simplicity. The actual model has 6 encoder anddecoder blocks. Note that there is no (spectral) processing on the input andoutput waveforms.

@ztyf-lq
Copy link
Contributor Author

ztyf-lq commented Sep 27, 2024

是的,有些地方的text按照我的方法处理不是很好,比如:
(4)

where T denotes the kernel timestep. During training, k canbe considered the “full” long 1D convolutional kernel withshape (output channels, input channels, length), in the sensethat the output y can be computed via the long convolutionyj = > u *kij. By the convolution theorem, we can performthis operation in the frequency domain, which becomes apoint-wise product gjf = >, ukijf. The hat symbol denotesthe Fourier transform of the signal (with the index f denoting

这一段有多余的\n符,我正在想新的方法避免这些问题。

Copy link
Collaborator

@GreatV GreatV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
可以先合入,后面补充文档,优化处理逻辑。

@jzhang533 jzhang533 merged commit 269e5b8 into PaddlePaddle:main Sep 29, 2024
3 checks passed
@luotao1
Copy link
Collaborator

luotao1 commented Oct 15, 2024

@ztyf-lq Thanks for your contribution! You will receive a beautiful PaddlePaddle gift. Please provide your mailing address by filling out the following questionnaire before October 18th.

Looking forward to the future, we will walk further together in the world of open source!
Click Here :https://paddle.wjx.cn/vm/h4On9gJ.aspx#

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants