-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docprompt example in pipelines #3534
Conversation
@@ -0,0 +1,3 @@ | |||
numpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy这个依赖不需要,因为paddle有专门的numpy依赖
requests也可以再确认一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle依赖numpy requests,已删除
logger = logging.getLogger(__name__) | ||
|
||
|
||
class DocPreProcessor(BaseComponent): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的命名需要再次看一下,这里主要是做OCR识别,同时还是PaddleOCR的Node,这块命名建议PaddleOCR的一些name进去
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改为DocOCRProcessor
import paddle | ||
from paddlenlp.transformers import AutoTokenizer | ||
from paddlenlp.taskflow.utils import download_file, ImageReader, get_doc_pred, find_answer_pos, sort_res | ||
from paddlenlp.taskflow.task import Task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个Task看起来没有使用
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删除
新增docprompt example文档 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
Others
Description
新增docprompt在pipelines案例
-将数据处理(包含OCR)和模型计算拆分为两个节点DocPreProcessor DocPrompter
-用pipeline DocPipeline串联
-提供pipeline核心库运行示例 examples/document-intelligence/docprompt_example.py
-提供基于gradio前端UI和基于FastAPI后端代码示例