-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.在ppstructure管道中添加latex_ocr公式识别功能;2.添加pdf转markdown文件功能 #13868
Conversation
感谢大佬的贡献 |
@liuhongen1234567 大佬,麻烦review一下这个PR。 |
建议更新一下文档,说明使用方法。由于我们的文档站点还在迁移中,所以需要更新两个地方。 ppstructure
docs |
您好,我后续有更新文档的打算,最近可能使用ppocr复现其他的项目,更新文档的时间最晚会在十月。 |
ppstructure/predict_system.py
Outdated
@@ -78,6 +80,13 @@ def __init__(self, args): | |||
) | |||
else: | |||
self.table_system = TableSystem(args) | |||
if args.formula: | |||
args_fomula = deepcopy(args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo in args_fomula which should be args_formula
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will modify it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had modified it.
you may need to sign the updated CLA, and I think we can leave the documentation as future work. |
I had updated the CLA |
这个功能该怎么使用呢,麻烦给一个示例,我们验证一下。 |
运行示例: |
看起来不错,非常好的工作。就是双栏处理看起来似乎还有点问题。 where u E Rn is the input signal, E Rh is the internal state,and y E Rm is the output. Here, we are letting n > 1, m > 1,which yields a multiple-input, multiple-output (MIMO) state-space model. For the remainder of this paper, we will ignorethe Du term as we do not use it. The state-space model in its original form describes acontinuous-time system, but in the field of digital signalprocessing, there are standard recipes for discretizing sucha system into a discrete-time state-space model. One suchmethod that we use in this work is the zero-order hold (ZOH),which gives us the discrete-time state-space matrices A andB as follows: The discrete state-space model is then given byc[t+1]=Ax[t]+Bu[t], y[t]=Cx[t] We use an hourglass network with long-range skip connec-tions, similar in form to the Sashimi network [24] for audiogeneration. However, unlike previous works using state-spacemodels for audio processing [24], [29], [30], our networkdirectly takes in raw audio waveforms in the -1 to +1 rangeand outputs raw waveforms as well, with no one-hot encodingor spectral processing (e.g. STFT or iSTFT). Furthermore,we retain causality as much as possible for sake of real-timeinference, meaning that we eschew any form of bidirectionalstate-space layers. See Fig. 1 for a schematic drawing. As with typical auto-encoder networks, the audio featuresare down-sampled in the encoder and then up-sampled inthe decoder. For the re-sampling operation, we use a simpleIThis is a prior not required, as technically we can configure our networkas complex-valued to handle complex features.However, we do not explorethis configuration in this work. 2The size of the internal state h,can be interpreted as the degree ofparametrization of a basis temporal kernel, or some implicit (dilated)“kernelsize" in the frequency domain. We explore this in a future work. (3) In the context of recurrent neural networks (RNNs),this isessentially a linear RNN layer, which allows for efficientonline inference and generation (in our case real-time speechenhancement), but at the same time efficient parallelizationduring training. It is straightforward to check that the discrete-time impulseresponse is given as (4) where T denotes the kernel timestep. During training, k canbe considered the “full” long 1D convolutional kernel withshape (output channels, input channels, length), in the sensethat the output y can be computed via the long convolutionyj = > u *kij. By the convolution theorem, we can performthis operation in the frequency domain, which becomes apoint-wise product gjf = >, ukijf. The hat symbol denotesthe Fourier transform of the signal (with the index f denoting Fig. 1. A schematic drawing of the network architecture, with only 2 encoderand decoder blocks shown for simplicity. The actual model has 6 encoder anddecoder blocks. Note that there is no (spectral) processing on the input andoutput waveforms. |
是的,有些地方的text按照我的方法处理不是很好,比如: where T denotes the kernel timestep. During training, k canbe considered the “full” long 1D convolutional kernel withshape (output channels, input channels, length), in the sensethat the output y can be computed via the long convolutionyj = > u *kij. By the convolution theorem, we can performthis operation in the frequency domain, which becomes apoint-wise product gjf = >, ukijf. The hat symbol denotesthe Fourier transform of the signal (with the index f denoting 这一段有多余的\n符,我正在想新的方法避免这些问题。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
可以先合入,后面补充文档,优化处理逻辑。
@ztyf-lq Thanks for your contribution! You will receive a beautiful PaddlePaddle gift. Please provide your mailing address by filling out the following questionnaire before October 18th. Looking forward to the future, we will walk further together in the world of open source! |
前言
尊敬的 ppocr 官方人员您好,我是一名 ppocr 项目的使用者,在日常工作学习中我都会用到ppocr,我深感 ppocr 的强大之处!同时能为 ppocr 做贡献也是我非常想要做的事情。非常期待您在百忙之中看看我写的代码是否是 ppocr 所需要的。
改动如下:
ppstructure
管道中添加latex_ocr
公式识别功能;a. 修改
ppstructure/predict_system.py
文件中StructureSystem
类,添加latex_ocr
模型和布局为公式的区域处理;b. 由于 docx 中不支持插入 latex 公式,在
ppstructure/recovery/recovery_to_doc.py
文件中convert_info_docx
函数中跳过latex公式;c. 在
ppstructure/utility.py
中draw_structure_result
函数中可视化 ocr 结果中跳过 latex 公式;a. 在目录
ppstructure/recovery
下添加文件recovery_to_markdown.py
,其中程序功能为转换ppstructure识别结果为markdown文件。其中对于文本区域处理目前给出了两种处理方法,第一种为每一个自然段分割标志位开头两个空格,第二种为每个自然段开头没有空格,这种情况下以每个自然段最后一行一般不会是“满行”,而是会留有空余空间;b.
ppstructure/predict_system.py
文件中调用转换 ppstructure 识别结果到 markdown 文件的函数;a. 添加
latex_ocr
公式识别模型必要的参数;b. 添加
recovery_to_markdown
选项达到开启/关闭转换 ppstructure 识别结果到 markdown 文件;c. 添加 formula 选项达到开启/关闭latex公式识别;
如果我的代码恰巧是 ppocr 所需要的,后续我会跟进官方人员的建议并且在版面恢复文档中添加 pdf 转 markdown 文件的教程。