feat(serving): add returnMarkdownImages switch for VL & PP-StructureV3#5135
Open
scyyh11 wants to merge 13 commits into
Open
feat(serving): add returnMarkdownImages switch for VL & PP-StructureV3#5135scyyh11 wants to merge 13 commits into
scyyh11 wants to merge 13 commits into
Conversation
`FigureCanvasAgg.tostring_rgb()` was removed in matplotlib 3.10. Use `buffer_rgba()` instead, which is available since matplotlib 3.1+. Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Add a request-level boolean (default true, backward compatible) to PaddleOCR-VL InferRequest / RestructurePagesRequest and PP-StructureV3 InferRequest. When set to false, markdown.images is omitted from the response and the server skips base64 encoding / URL upload of the Markdown-referenced images, addressing response-size inflation in text-only consumers and multi-page PDF scenarios. The HPS _group_inputs._hash on the VL and PP-StructureV3 layout-parsing handlers is intentionally left unchanged: returnMarkdownImages only affects _postprocess (which already runs per request with each input's own value), so adding it to the batching hash would split groups and trigger redundant model runs without any correctness benefit. This matches how other post-processing flags (prettifyMarkdown, showFormulaNumber, visualize) are handled. Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Pipeline docs (VL and PP-StructureV3, zh/en):
- Add returnMarkdownImages to the request parameter tables (both VL
endpoints).
- Note that markdown.images is "null or omitted" when the flag is false
— PaddleX serving uses response_model_exclude_none, so the field is
actually absent rather than explicitly null.
- Above each LayoutParsingResult element schema, add a one-paragraph
note explaining that response image fields (outputImages, inputImage,
markdown.images) switch from base64 strings to pre-signed URLs when
the server is configured to return URLs.
General serving docs (docs/pipeline_deploy/serving.{md,en.md}):
- New section explaining how to configure Serving.extra (file_storage,
return_img_urls, url_expires_in) so the server returns pre-signed BOS
URLs instead of inline base64. The mode was previously undocumented
despite being the recommended setup for large or multi-page responses.
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
|
Thanks for your contribution! |
Open
3 tasks
…descriptions Refining the docs added in the previous commit based on review feedback on the companion PaddleOCR PR (PaddlePaddle/PaddleOCR#18060): - Response field descriptions for outputImages, inputImage, exports, and markdown.images previously hard-coded "Base64-encoded". They now read "Base64 by default; pre-signed URL when URL-return mode is enabled" (or the zh equivalent), so the field-level descriptions match what the server actually returns once return_img_urls is enabled. - The URL-return section in docs/pipeline_deploy/serving.{md,en.md} is renamed from "Returning Images as URLs" to "Returning Response Files as URLs" and rewritten to cover exports (docx, ...) — build_pipeline_ exports uses the same return_urls / file_storage chain as postprocess_images, so the previous image-only framing was misleading. A note also points out that the configuration key is historically named return_img_urls but currently controls every Base64-inlined file field in the response, not just images. - BOS configuration wording is aligned with the existing PaddleOCR-VL.md guidance: ak/sk are no longer expanded to "Access Key" / "Secret Key" (those expansions are not the Baidu Cloud terminology), and the section now links to the official Baidu Intelligent Cloud BOS docs. - The preamble above each LayoutParsingResult element schema in the VL and PP-StructureV3 pipeline docs (zh/en) is updated from "image fields" to "image and file fields", lists exports alongside the image fields, and points to the renamed serving section. Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Bobholamovic
requested changes
May 27, 2026
| ## 3. 以 URL 形式返回响应文件 | ||
|
|
||
| 基础服务化与高稳定性服务化默认以 base64 编码内联返回响应中的图像字段(如 `outputImages`、`inputImage`、`markdown.images` 等)。当响应包含较大图像或多页 PDF 时,base64 会显著增加响应体积。可改为 URL 模式:服务端将图像写入对象存储,响应中只返回预签名 URL。 | ||
| 基础服务化与高稳定性服务化默认以 Base64 编码内联返回响应中的图像与文件字段,例如 `outputImages`、`inputImage`、`markdown.images`、`exports`(docx 等)。当响应中包含较大图像或多页 PDF 时,Base64 会显著增加响应体积。可改为 URL 模式:服务端将这些文件写入对象存储,响应中只返回预签名 URL。 |
Member
There was a problem hiding this comment.
基础服务化与高稳定性服务化默认以 Base64 编码内联返回响应中的图像等二进制内容。当响应中包含较大图像或多页 PDF 时,Base64 会显著增加响应体积,可配置服务返回 URL。
Member
There was a problem hiding this comment.
当前的机制被设计为可扩展的,所以建议这里只提URL。只是当前只支持百度智能云对象存储,以后也不排除支持服务本地file serving返回url之类的。
| 其中 `outputs[0].data[0]` 是一个 JSON 字符串,其中的字段与基础服务化部署方案中的响应体保持一致,具体解析规则可以查看各产线使用教程。 | ||
|
|
||
| ## 3. 配置图像 URL 返回 | ||
| ## 3. 以 URL 形式返回响应文件 |
| 基础服务化与高稳定性服务化默认以 base64 编码内联返回响应中的图像字段(如 `outputImages`、`inputImage`、`markdown.images` 等)。当响应包含较大图像或多页 PDF 时,base64 会显著增加响应体积。可改为 URL 模式:服务端将图像写入对象存储,响应中只返回预签名 URL。 | ||
| 基础服务化与高稳定性服务化默认以 Base64 编码内联返回响应中的图像与文件字段,例如 `outputImages`、`inputImage`、`markdown.images`、`exports`(docx 等)。当响应中包含较大图像或多页 PDF 时,Base64 会显著增加响应体积。可改为 URL 模式:服务端将这些文件写入对象存储,响应中只返回预签名 URL。 | ||
|
|
||
| > 配置项历史名为 `return_img_urls`,但当前实际控制响应中所有 Base64 内联文件字段,不仅是图像。 |
Member
There was a problem hiding this comment.
这里的名称确实不合理,建议趁着当前还不是公开接口,把这个问题修了:
- 重命名为“return_img_urls”为“return_urls“或者”return_file_urls“之类的
- 配置文件的Serving字段,extra配置的是各产线服务专属的字段,目前来看return_img_urls这个字段更适合升级为一个所有产线共享的字段(和visualize类似)
- 需要同步改造hps部分
Collaborator
Author
There was a problem hiding this comment.
这个修改时候要不要保留对旧 key extra.return_img_urls 的兼容,还是直接删除掉呢?
| </tbody> | ||
| </table> | ||
| <p>下表中涉及图像的字段(如 <code>outputImages</code>、<code>inputImage</code>、<code>markdown.images</code>)默认以 JPEG Base64 字符串内联返回;当服务端开启 URL 返回模式时,相应字段的值变为预签名 URL,字段类型保持不变。配置方式参见 <a href="../../../pipeline_deploy/serving.md">服务化部署</a>「配置图像 URL 返回」一节。</p> | ||
| <p>下表中涉及图像或文件的字段(如 <code>outputImages</code>、<code>inputImage</code>、<code>markdown.images</code>、<code>exports</code>)默认以 Base64 字符串内联返回;当服务端开启 URL 返回模式时,相应字段的值变为预签名 URL,字段类型保持不变。配置方式参见 <a href="../../../pipeline_deploy/serving.md">服务化部署</a>「以 URL 形式返回响应文件」一节。</p> |
scyyh11
added a commit
to scyyh11/PaddleX
that referenced
this pull request
May 28, 2026
…e-pages Two follow-ups on the rename PR (PaddlePaddle#5135 review): * Make `Serving.return_urls` Optional[bool] (default None) so an explicit `false` overrides a stale `Serving.extra.return_img_urls: true` during migration. Previously the legacy key was applied whenever the new field was falsy, preventing users from disabling URL mode without also deleting the deprecated key. Coerce None to False at the end of init. Same fix applied to the 12 HPS pipeline init blocks. * Wire `Serving.return_urls` through HPS PaddleOCR-VL `/restructure-pages` (deploy/hps/sdk/pipelines/PaddleOCR-VL/server/model_repo/restructure-pages). Previously the model.py hardcoded `file_storage=None, return_urls=False` for both build_pipeline_exports calls (concat + non-concat), so DOCX exports were forced to Base64 even when the rest of the service returned URLs. Adds an `initialize()` override mirroring the layout-parsing pipeline. Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
scyyh11
added a commit
to scyyh11/PaddleX
that referenced
this pull request
May 28, 2026
…n_urls Reviewer feedback from PR PaddlePaddle#5135. The existing `Serving.extra.return_img_urls` switch (added in PaddlePaddle#2848, 2025-01) is not pipeline-specific and belongs on the shared `AppConfig` schema alongside `visualize`. Renamed accordingly (name `return_urls` reflects that it controls all binary fields, not just images). Old `extra.return_img_urls` key is still honored at startup with a DeprecationWarning so existing configs keep working. Schema: * `AppConfig.return_urls: Optional[bool] = None` (tri-state so an explicit `false` overrides a stale `extra.return_img_urls: true` during migration; coerced to `False` after legacy resolution). Applies to basic serving (1 schema + 11 pipeline apps + 1 shared plumbing file) and HPS (12 Triton model.py + restructure-pages init for VL). Also fixes a pre-existing inconsistency: HPS PaddleOCR-VL `/restructure-pages` previously hardcoded `file_storage=None, return_urls=False` for both `build_pipeline_exports` calls, so DOCX exports were forced to Base64 even when the rest of the service returned URLs. Adds an `initialize()` override mirroring layout-parsing. Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Bobholamovic
pushed a commit
that referenced
this pull request
May 28, 2026
…n_urls (#5143) * Fix matplotlib 3.10 compatibility: replace removed tostring_rgb() `FigureCanvasAgg.tostring_rgb()` was removed in matplotlib 3.10. Use `buffer_rgba()` instead, which is available since matplotlib 3.1+. Signed-off-by: Bvicii <yizhanhuang2002@gmail.com> * refactor(serving): promote return_img_urls to top-level Serving.return_urls Reviewer feedback from PR #5135. The existing `Serving.extra.return_img_urls` switch (added in #2848, 2025-01) is not pipeline-specific and belongs on the shared `AppConfig` schema alongside `visualize`. Renamed accordingly (name `return_urls` reflects that it controls all binary fields, not just images). Old `extra.return_img_urls` key is still honored at startup with a DeprecationWarning so existing configs keep working. Schema: * `AppConfig.return_urls: Optional[bool] = None` (tri-state so an explicit `false` overrides a stale `extra.return_img_urls: true` during migration; coerced to `False` after legacy resolution). Applies to basic serving (1 schema + 11 pipeline apps + 1 shared plumbing file) and HPS (12 Triton model.py + restructure-pages init for VL). Also fixes a pre-existing inconsistency: HPS PaddleOCR-VL `/restructure-pages` previously hardcoded `file_storage=None, return_urls=False` for both `build_pipeline_exports` calls, so DOCX exports were forced to Base64 even when the rest of the service returned URLs. Adds an `initialize()` override mirroring layout-parsing. Signed-off-by: Bvicii <yizhanhuang2002@gmail.com> --------- Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Add the reference-style URL-return note (pointing at the serving deployment doc's "以 URL 形式返回响应文件" section) to the serving response schemas of OCR, formula_recognition, seal_recognition, table_recognition, table_recognition_v2, layout_parsing, PP-DocTranslation, and PP-ChatOCRv3/v4 doc, matching PP-StructureV3 and PaddleOCR-VL. Binary fields that fall back to Base64 are annotated as becoming pre-signed URLs when URL-return mode is on. Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
…urn-markdown-images Signed-off-by: Bvicii <yizhanhuang2002@gmail.com> # Conflicts: # deploy/hps/sdk/pipelines/PP-StructureV3/server/model_repo/layout-parsing/1/model.py # deploy/hps/sdk/pipelines/PaddleOCR-VL/server/model_repo/layout-parsing/1/model.py # paddlex/inference/serving/basic_serving/_pipeline_apps/paddleocr_vl.py # paddlex/inference/serving/basic_serving/_pipeline_apps/pp_structurev3.py
PR PaddlePaddle#5143 promoted the URL-return switch from Serving.extra.return_img_urls to a top-level Serving.return_urls field (the legacy key is still honored at startup with a DeprecationWarning). Update the 'Returning Response Files as URLs' section in both language docs to teach the new top-level field, keep file_storage/url_expires_in under Serving.extra, and note the legacy key as deprecated. Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Per review r3308340966: the URL-return mechanism is designed to be extensible (only object storage / BOS today, but service-local file serving may return URLs later), so the intro should just say the fields can be returned as URLs, without baking in 'object storage / pre-signed URL'. Concrete BOS config stays in the config block and notes below. Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Per review r3308333339: the section covers returning binary response content (images and files such as exports) as URLs, so rename the title from '响应文件 / Response Files' to '二进制内容 / Binary Content' and update all cross-references (and the VL config subsection) to match. Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
|
|
||
| 基础服务化与高稳定性服务化默认以 Base64 编码内联返回响应中的图像与文件字段,例如 `outputImages`、`inputImage`、`markdown.images`、`exports`(docx 等)。当响应中包含较大图像或多页 PDF 时,Base64 会显著增加响应体积。可配置服务改为以 URL 形式返回:响应中相应字段的值变为可下载的 URL,而非内联 Base64。 | ||
|
|
||
| > 该开关为顶层字段 `Serving.return_urls`,控制响应中所有 Base64 内联文件字段(图像以及 `exports` 等导出文件),不仅是图像。旧配置项 `Serving.extra.return_img_urls` 仍被兼容(启动时给出弃用告警),新配置请改用 `Serving.return_urls`。 |
Member
There was a problem hiding this comment.
因为我们实际上之前并没有就这部分加文档,我建议可以不添加这一段
| </tr> | ||
| </tbody> | ||
| </table> | ||
| <p>下表中涉及图像或文件的字段(如 <code>outputImages</code>、<code>inputImage</code>、<code>markdown.images</code>、<code>exports</code>)默认以 Base64 字符串内联返回;当服务端开启 URL 返回模式时,相应字段的值变为预签名 URL,字段类型保持不变。配置方式参见 <a href="../../../pipeline_deploy/serving.md">服务化部署</a>「以 URL 形式返回二进制内容」一节。</p> |
Member
There was a problem hiding this comment.
图像也属于文件,所以这里的称呼可能不太合适。建议可以考虑一个更通用的称呼,例如“图像等二进制文件”
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
returnMarkdownImages(defaulttrue, backward compatible) to PaddleOCR-VLInferRequest/RestructurePagesRequestand PP-StructureV3InferRequest. When set tofalse,markdown.imagesis omitted from the response and the server skips base64 encoding / URL upload of Markdown-referenced images, addressing response-size inflation for text-only consumers and multi-page PDF scenarios._group_inputs._hashis intentionally not modified:returnMarkdownImagesonly affects_postprocess(which already runs per request with each input's own value), so including it in the batching hash would split groups and trigger redundantpipeline()runs without any correctness benefit. The commit message documents this constraint.markdown.imagesis "null or omitted" when the flag is false (PaddleX serving usesresponse_model_exclude_none), and add a one-paragraph note above eachLayoutParsingResultelement schema explaining that image fields switch to pre-signed URLs when URL-return mode is enabled.docs/pipeline_deploy/serving.{md,en.md}gets a new section documenting how to configureServing.extra(file_storage,return_img_urls,url_expires_in) to return images as pre-signed BOS URLs instead of inline base64 — previously undocumented despite being the recommended setup for large or multi-page responses.Companion PR
This change has a companion docs PR in PaddleOCR (mirror docs + same URL-mode section in the version3.x deployment guide): PaddlePaddle/PaddleOCR#18060.
Test plan
false, backward compat with omitted field)_group_inputs._hash— two requests differing only inreturnMarkdownImagesproduce the same hash (will batch); requests differing in an inference param still split.true/falseagainst/layout-parsing; cross-casemarkdown.textidentical./layout-parsingand both branches of/restructure-pages(concatenatePagestrue and false), with injected fakemarkdownImagesto give the gating logic strong evidence.