Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docprompt example in pipelines #3534

Merged
merged 8 commits into from
Oct 24, 2022
Merged

Conversation

lugimzzz
Copy link
Contributor

@lugimzzz lugimzzz commented Oct 21, 2022

PR types

Others

PR changes

Others

Description

新增docprompt在pipelines案例

-将数据处理(包含OCR)和模型计算拆分为两个节点DocPreProcessor DocPrompter
-用pipeline DocPipeline串联
-提供pipeline核心库运行示例 examples/document-intelligence/docprompt_example.py
-提供基于gradio前端UI和基于FastAPI后端代码示例

@@ -0,0 +1,3 @@
numpy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy这个依赖不需要,因为paddle有专门的numpy依赖

requests也可以再确认一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddle依赖numpy requests,已删除

logger = logging.getLogger(__name__)


class DocPreProcessor(BaseComponent):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的命名需要再次看一下,这里主要是做OCR识别,同时还是PaddleOCR的Node,这块命名建议PaddleOCR的一些name进去

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改为DocOCRProcessor

import paddle
from paddlenlp.transformers import AutoTokenizer
from paddlenlp.taskflow.utils import download_file, ImageReader, get_doc_pred, find_answer_pos, sort_res
from paddlenlp.taskflow.task import Task
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个Task看起来没有使用

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除

@lugimzzz
Copy link
Contributor Author

新增docprompt example文档

Copy link
Collaborator

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lugimzzz lugimzzz merged commit 97fe71d into PaddlePaddle:develop Oct 24, 2022
@lugimzzz lugimzzz deleted the doc branch October 24, 2022 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants