Skip to content

Conversation

an1018
Copy link
Collaborator

@an1018 an1018 commented Aug 19, 2022

更新版面恢复代码

@paddle-bot
Copy link

paddle-bot bot commented Aug 19, 2022

Thanks for your contribution!

import fitz
from PIL import Image
imgs = []
pdf = fitz.open(img_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以按照下面这样写法试下,避免手动操作pdf.close()

with fitz.open(img_path) as pdf:
    for pg in range(0, .....
    ....

mat = fitz.Matrix(2, 2)
pm = page.getPixmap(matrix=mat, alpha=False)

if pm.width>2000 or pm.height>2000:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

提交代码之前做一下pre-commit,另外就是这里的2000,建议给一下注释,不然不知道为啥设置这个值

from ppstructure.recovery.recovery_to_doc import convert_info_docx
convert_info_docx(img, res, save_folder, img_name, args.save_pdf)
except:
logger.error("error in layout recovery image:{}".format(image_file))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里也打印下具体的报错信息,方便定位

try Exception as ex:

if args.recovery and all_res != []:
try:
convert_info_docx(img, all_res, save_folder, img_name, args.save_pdf)
except:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

@@ -46,7 +47,7 @@ def convert_info_docx(img, res, save_folder, img_name):
section._sectPr.xpath('./w:cols')[0].set(qn('w:num'), '2')
flag = 2

if region['type'] == 'Figure':
if region['type'] == 'figure':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里处理的时候加一个.lower()

Tell HTMLParser to ignore any tags until the corresponding closing table tag
"""
doc = Document()
table_soup = BeautifulSoup(html, 'html.parser')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件名是必需这样写还是说是一个可以传入的配置呢?

user can pass existing document object as arg
(if they want to manage rest of document themselves)
How to deal with block level style applied over table elements? e.g. text align
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

添加下license,另外如果是参考别的实现的话,在前面也添加下引用,可以参考ppocr/modeling/backbone/*里面的引用方法

for index, img in enumerate(imgs):
res, time_dict = structure_sys(img, str(index))
if structure_sys.mode == 'structure' and res != []:
save_structure_res(res, save_folder, img_name, str(index))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个index没有必要单独处理为字符串?直接用int应该就好,对应函数的默认值也改下

@@ -215,27 +218,74 @@ def main(args):
for i, image_file in enumerate(image_file_list):
logger.info("[{}/{}] {}".format(i, img_num, image_file))
img, flag = check_and_read_gif(image_file)
imgs, flag_pdf = check_and_read_pdf(image_file)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新建一个函数check_and_read, 在里面进行判断gif和pdf,避免在外面调用两次

img = Image.frombytes("RGB", [pm.width, pm.height], pm.samples)
img = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
imgs.append(img)
return imgs, False, True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是否返回imgs,true就可以了, 现在返回两个flag,下面的判断语句会一直是false

@littletomatodonkey littletomatodonkey merged commit b7d99ac into PaddlePaddle:dygraph Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants