Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPStructure中的SER+RE任务对内存要求是多大? #8602

Closed
tanjh opened this issue Dec 12, 2022 · 9 comments
Closed

PPStructure中的SER+RE任务对内存要求是多大? #8602

tanjh opened this issue Dec 12, 2022 · 9 comments
Assignees

Comments

@tanjh
Copy link
Contributor

tanjh commented Dec 12, 2022

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

  • 系统环境/System Environment:Linux debian 3.10.0-1127.el7.x86_64 Upload PaddleOCR code  #1
  • 版本号/Version:Paddle:2.4.0 PaddleOCR: Release 2.6
  • 问题相关组件/Related components:ppstructure
  • 运行指令/Command Code:
python3 predict_system.py \
  --kie_algorithm=LayoutXLM \
  --re_model_dir=./inference/re_vi_layoutxlm_xfund_infer \
  --ser_model_dir=./inference/ser_vi_layoutxlm_xfund_infer \
  --image_dir=../pg/HJ0332.jpg \
  --ser_dict_path=../ppocr/utils/dict/kie_dict/xfund_class_list.txt \
  --vis_font_path=../doc/fonts/simfang.ttf \
  --ocr_order_method="tb-yx" \
  --mode=kie
  • 完整报错/Complete Error Message:
    root@a1793fdcfb9f:/test_ppocr/PaddleOCR/ppstructure# python3 predict_system.py \

--kie_algorithm=LayoutXLM
--re_model_dir=./inference/re_vi_layoutxlm_xfund_infer
--ser_model_dir=./inference/ser_vi_layoutxlm_xfund_infer
--image_dir=../pg/HJ0332.jpg
--ser_dict_path=../ppocr/utils/dict/kie_dict/xfund_class_list.txt
--vis_font_path=../doc/fonts/simfang.ttf
--ocr_order_method="tb-yx"
--mode=kie

[2022-12-12 01:45:13,606] [ INFO] - Already cached /root/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model
[2022-12-12 01:45:14,234] [ INFO] - tokenizer config file saved in /root/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2022-12-12 01:45:14,234] [ INFO] - Special tokens file saved in /root/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
E1212 01:45:14.372604 768 analysis_config.cc:96] Please compile with gpu to EnableGpu()
E1212 01:45:21.777576 768 analysis_config.cc:96] Please compile with gpu to EnableGpu()
[2022/12/12 01:45:36] ppocr INFO: [0/1] ../pg/HJ0332.jpg

Socket error Event: 32 Error: 10053.
Connection closing...Socket close.

Connection closed by foreign host.

Disconnected from remote host(192.168.1.220) at 10:10:21.

介绍: 尝试使用PaddleOCR的PPStructure的SER+RE对图片进行文字识别,并提取关键信息。
以python 3.7-slim镜像为基础,构建了paddlehub、paddleOCR环境后,下载了ser_vi_layoutxlm_xfund_infer.tar 和 re_vi_layoutxlm_xfund_infer.tar模型,做成镜像,在一台4C8G服务器(4.71G内存可用)上运行该镜像,执行上述命令后,内存飙升,服务器卡死,最后只能通过重启服务器恢复。
HJ0332.jpg大小是397571Byte。
现象: 内存飙升,服务器卡死,ssh都不可用,输入无响应

问题: 请问 ppstructure的SER+RE做关键信息提取任务时,对内存需求是什么要求?

@tanjh
Copy link
Contributor Author

tanjh commented Dec 12, 2022

01783F0B-1E2A-4721-816C-9A9D781BD9F3
81509E79-95A3-4161-A8CD-2CDEA095D867

@tanjh
Copy link
Contributor Author

tanjh commented Dec 12, 2022

现在直接执行 https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/ppstructure/docs/inference.md中的SER+RE命令:
python3 predict_system.py
--kie_algorithm=LayoutXLM
--re_model_dir=./inference/re_vi_layoutxlm_xfund_infer
--ser_model_dir=./inference/ser_vi_layoutxlm_xfund_infer
--image_dir=./docs/kie/input/zh_val_42.jpg
--ser_dict_path=../ppocr/utils/dict/kie_dict/xfund_class_list.txt
--vis_font_path=../doc/fonts/simfang.ttf
--ocr_order_method="tb-yx"
--mode=kie
可以看到服务器上该任务内存和CPU占用都很高。
B737B9D9-476B-493b-9CD2-74AE677601B2

@LDOUBLEV
Copy link
Collaborator

没太关注过内存上限,内存占用大小和输入图像有关,较小的图像,占用内存更小

@jingsongliujing
Copy link
Collaborator

建议GPU用v100,cpu的话16核就够了

@tanjh
Copy link
Contributor Author

tanjh commented Dec 13, 2022

没太关注过内存上限,内存占用大小和输入图像有关,较小的图像,占用内存更小

本次实验图片大小为1.4M,3.9M,这个算大图片还是小图片?

@tanjh
Copy link
Contributor Author

tanjh commented Dec 13, 2022

建议GPU用v100,cpu的话16核就够了

不应用GPU计算,单纯只考虑CPU版本,这个对于硬件有强制性要求吗?是个什么样的要求?

@jingsongliujing
Copy link
Collaborator

就平时办公电脑的配置试试

@tanjh
Copy link
Contributor Author

tanjh commented Dec 13, 2022

今天按照https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/ppstructure/kie/README_ch.md 中4.2的操作实验了389KB,1.8M的图片,发现CPU计算最终需要耗费5G左右才能出SER+RE的结果。

@github-actions
Copy link
Contributor

github-actions bot commented Jul 9, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants