Skip to content

feat(serving): add returnMarkdownImages switch for VL & PP-StructureV3#5135

Open
scyyh11 wants to merge 13 commits into
PaddlePaddle:developfrom
scyyh11:feat/serving-return-markdown-images
Open

feat(serving): add returnMarkdownImages switch for VL & PP-StructureV3#5135
scyyh11 wants to merge 13 commits into
PaddlePaddle:developfrom
scyyh11:feat/serving-return-markdown-images

Conversation

@scyyh11
Copy link
Copy Markdown
Collaborator

@scyyh11 scyyh11 commented May 25, 2026

Summary

  • Add a request-level returnMarkdownImages (default true, backward compatible) to PaddleOCR-VL InferRequest / RestructurePagesRequest and PP-StructureV3 InferRequest. When set to false, markdown.images is omitted from the response and the server skips base64 encoding / URL upload of Markdown-referenced images, addressing response-size inflation for text-only consumers and multi-page PDF scenarios.
  • Both basic serving handlers and the corresponding HPS handlers are updated. HPS _group_inputs._hash is intentionally not modified: returnMarkdownImages only affects _postprocess (which already runs per request with each input's own value), so including it in the batching hash would split groups and trigger redundant pipeline() runs without any correctness benefit. The commit message documents this constraint.
  • Pipeline docs (VL and PP-StructureV3, zh/en) are updated to describe the new request field, note that markdown.images is "null or omitted" when the flag is false (PaddleX serving uses response_model_exclude_none), and add a one-paragraph note above each LayoutParsingResult element schema explaining that image fields switch to pre-signed URLs when URL-return mode is enabled.
  • docs/pipeline_deploy/serving.{md,en.md} gets a new section documenting how to configure Serving.extra (file_storage, return_img_urls, url_expires_in) to return images as pre-signed BOS URLs instead of inline base64 — previously undocumented despite being the recommended setup for large or multi-page responses.

Companion PR

This change has a companion docs PR in PaddleOCR (mirror docs + same URL-mode section in the version3.x deployment guide): PaddlePaddle/PaddleOCR#18060.

Test plan

  • Schema sanity (default value, accepts false, backward compat with omitted field)
  • HPS _group_inputs._hash — two requests differing only in returnMarkdownImages produce the same hash (will batch); requests differing in an inference param still split.
  • PP-StructureV3 basic serving end-to-end: default / true / false against /layout-parsing; cross-case markdown.text identical.
  • PaddleOCR-VL basic serving end-to-end: /layout-parsing and both branches of /restructure-pages (concatenatePages true and false), with injected fake markdownImages to give the gating logic strong evidence.
  • HPS end-to-end (Triton + Docker) — not runnable on macOS local; defer to CI / staging.

scyyh11 and others added 5 commits April 15, 2026 01:34
`FigureCanvasAgg.tostring_rgb()` was removed in matplotlib 3.10.
Use `buffer_rgba()` instead, which is available since matplotlib 3.1+.

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Add a request-level boolean (default true, backward compatible) to
PaddleOCR-VL InferRequest / RestructurePagesRequest and PP-StructureV3
InferRequest. When set to false, markdown.images is omitted from the
response and the server skips base64 encoding / URL upload of the
Markdown-referenced images, addressing response-size inflation in
text-only consumers and multi-page PDF scenarios.

The HPS _group_inputs._hash on the VL and PP-StructureV3 layout-parsing
handlers is intentionally left unchanged: returnMarkdownImages only
affects _postprocess (which already runs per request with each input's
own value), so adding it to the batching hash would split groups and
trigger redundant model runs without any correctness benefit. This
matches how other post-processing flags (prettifyMarkdown,
showFormulaNumber, visualize) are handled.

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Pipeline docs (VL and PP-StructureV3, zh/en):
- Add returnMarkdownImages to the request parameter tables (both VL
  endpoints).
- Note that markdown.images is "null or omitted" when the flag is false
  — PaddleX serving uses response_model_exclude_none, so the field is
  actually absent rather than explicitly null.
- Above each LayoutParsingResult element schema, add a one-paragraph
  note explaining that response image fields (outputImages, inputImage,
  markdown.images) switch from base64 strings to pre-signed URLs when
  the server is configured to return URLs.

General serving docs (docs/pipeline_deploy/serving.{md,en.md}):
- New section explaining how to configure Serving.extra (file_storage,
  return_img_urls, url_expires_in) so the server returns pre-signed BOS
  URLs instead of inline base64. The mode was previously undocumented
  despite being the recommended setup for large or multi-page responses.

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 25, 2026

Thanks for your contribution!

@scyyh11 scyyh11 requested a review from Bobholamovic May 26, 2026 06:25
scyyh11 added 2 commits May 26, 2026 11:54
…descriptions

Refining the docs added in the previous commit based on review feedback
on the companion PaddleOCR PR (PaddlePaddle/PaddleOCR#18060):

- Response field descriptions for outputImages, inputImage, exports, and
  markdown.images previously hard-coded "Base64-encoded". They now read
  "Base64 by default; pre-signed URL when URL-return mode is enabled"
  (or the zh equivalent), so the field-level descriptions match what the
  server actually returns once return_img_urls is enabled.

- The URL-return section in docs/pipeline_deploy/serving.{md,en.md} is
  renamed from "Returning Images as URLs" to "Returning Response Files
  as URLs" and rewritten to cover exports (docx, ...) — build_pipeline_
  exports uses the same return_urls / file_storage chain as
  postprocess_images, so the previous image-only framing was misleading.
  A note also points out that the configuration key is historically
  named return_img_urls but currently controls every Base64-inlined
  file field in the response, not just images.

- BOS configuration wording is aligned with the existing PaddleOCR-VL.md
  guidance: ak/sk are no longer expanded to "Access Key" / "Secret Key"
  (those expansions are not the Baidu Cloud terminology), and the
  section now links to the official Baidu Intelligent Cloud BOS docs.

- The preamble above each LayoutParsingResult element schema in the VL
  and PP-StructureV3 pipeline docs (zh/en) is updated from "image
  fields" to "image and file fields", lists exports alongside the image
  fields, and points to the renamed serving section.

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Comment thread docs/pipeline_deploy/serving.md Outdated
## 3. URL 形式返回响应文件

基础服务化与高稳定性服务化默认以 base64 编码内联返回响应中的图像字段(如 `outputImages`、`inputImage`、`markdown.images` 等)。当响应包含较大图像或多页 PDF 时,base64 会显著增加响应体积。可改为 URL 模式:服务端将图像写入对象存储,响应中只返回预签名 URL。
基础服务化与高稳定性服务化默认以 Base64 编码内联返回响应中的图像与文件字段,例如 `outputImages`、`inputImage`、`markdown.images`、`exports`(docx 等)。当响应中包含较大图像或多页 PDF 时,Base64 会显著增加响应体积。可改为 URL 模式:服务端将这些文件写入对象存储,响应中只返回预签名 URL。
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

基础服务化与高稳定性服务化默认以 Base64 编码内联返回响应中的图像等二进制内容。当响应中包含较大图像或多页 PDF 时,Base64 会显著增加响应体积,可配置服务返回 URL。

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当前的机制被设计为可扩展的,所以建议这里只提URL。只是当前只支持百度智能云对象存储,以后也不排除支持服务本地file serving返回url之类的。

Comment thread docs/pipeline_deploy/serving.md Outdated
其中 `outputs[0].data[0]` 是一个 JSON 字符串,其中的字段与基础服务化部署方案中的响应体保持一致,具体解析规则可以查看各产线使用教程。

## 3. 配置图像 URL 返回
## 3. URL 形式返回响应文件
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

以 URL 形式返回二进制内容

Comment thread docs/pipeline_deploy/serving.md Outdated
基础服务化与高稳定性服务化默认以 base64 编码内联返回响应中的图像字段(如 `outputImages`、`inputImage`、`markdown.images` 等)。当响应包含较大图像或多页 PDF 时,base64 会显著增加响应体积。可改为 URL 模式:服务端将图像写入对象存储,响应中只返回预签名 URL。
基础服务化与高稳定性服务化默认以 Base64 编码内联返回响应中的图像与文件字段,例如 `outputImages`、`inputImage`、`markdown.images`、`exports`(docx 等)。当响应中包含较大图像或多页 PDF 时,Base64 会显著增加响应体积。可改为 URL 模式:服务端将这些文件写入对象存储,响应中只返回预签名 URL。

> 配置项历史名为 `return_img_urls`,但当前实际控制响应中所有 Base64 内联文件字段,不仅是图像。
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的名称确实不合理,建议趁着当前还不是公开接口,把这个问题修了:

  1. 重命名为“return_img_urls”为“return_urls“或者”return_file_urls“之类的
  2. 配置文件的Serving字段,extra配置的是各产线服务专属的字段,目前来看return_img_urls这个字段更适合升级为一个所有产线共享的字段(和visualize类似)
  3. 需要同步改造hps部分

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个修改时候要不要保留对旧 key extra.return_img_urls 的兼容,还是直接删除掉呢?

</tbody>
</table>
<p>下表中涉及图像的字段(如 <code>outputImages</code>、<code>inputImage</code>、<code>markdown.images</code>)默认以 JPEG Base64 字符串内联返回;当服务端开启 URL 返回模式时,相应字段的值变为预签名 URL,字段类型保持不变。配置方式参见 <a href="../../../pipeline_deploy/serving.md">服务化部署</a>「配置图像 URL 返回」一节。</p>
<p>下表中涉及图像或文件的字段(如 <code>outputImages</code>、<code>inputImage</code>、<code>markdown.images</code>、<code>exports</code>)默认以 Base64 字符串内联返回;当服务端开启 URL 返回模式时,相应字段的值变为预签名 URL,字段类型保持不变。配置方式参见 <a href="../../../pipeline_deploy/serving.md">服务化部署</a>「 URL 形式返回响应文件」一节。</p>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该不止这些产线的文档需要修改?

scyyh11 added a commit to scyyh11/PaddleX that referenced this pull request May 28, 2026
…e-pages

Two follow-ups on the rename PR (PaddlePaddle#5135 review):

* Make `Serving.return_urls` Optional[bool] (default None) so an explicit
  `false` overrides a stale `Serving.extra.return_img_urls: true` during
  migration. Previously the legacy key was applied whenever the new field
  was falsy, preventing users from disabling URL mode without also
  deleting the deprecated key. Coerce None to False at the end of init.
  Same fix applied to the 12 HPS pipeline init blocks.

* Wire `Serving.return_urls` through HPS PaddleOCR-VL `/restructure-pages`
  (deploy/hps/sdk/pipelines/PaddleOCR-VL/server/model_repo/restructure-pages).
  Previously the model.py hardcoded `file_storage=None, return_urls=False`
  for both build_pipeline_exports calls (concat + non-concat), so DOCX
  exports were forced to Base64 even when the rest of the service returned
  URLs. Adds an `initialize()` override mirroring the layout-parsing
  pipeline.

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
scyyh11 added a commit to scyyh11/PaddleX that referenced this pull request May 28, 2026
…n_urls

Reviewer feedback from PR PaddlePaddle#5135. The existing `Serving.extra.return_img_urls`
switch (added in PaddlePaddle#2848, 2025-01) is not pipeline-specific and belongs on
the shared `AppConfig` schema alongside `visualize`. Renamed accordingly
(name `return_urls` reflects that it controls all binary fields, not just
images). Old `extra.return_img_urls` key is still honored at startup with
a DeprecationWarning so existing configs keep working.

Schema:
* `AppConfig.return_urls: Optional[bool] = None` (tri-state so an explicit
  `false` overrides a stale `extra.return_img_urls: true` during migration;
  coerced to `False` after legacy resolution).

Applies to basic serving (1 schema + 11 pipeline apps + 1 shared plumbing
file) and HPS (12 Triton model.py + restructure-pages init for VL).

Also fixes a pre-existing inconsistency: HPS PaddleOCR-VL
`/restructure-pages` previously hardcoded `file_storage=None,
return_urls=False` for both `build_pipeline_exports` calls, so DOCX
exports were forced to Base64 even when the rest of the service returned
URLs. Adds an `initialize()` override mirroring layout-parsing.

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Bobholamovic pushed a commit that referenced this pull request May 28, 2026
…n_urls (#5143)

* Fix matplotlib 3.10 compatibility: replace removed tostring_rgb()

`FigureCanvasAgg.tostring_rgb()` was removed in matplotlib 3.10.
Use `buffer_rgba()` instead, which is available since matplotlib 3.1+.

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>

* refactor(serving): promote return_img_urls to top-level Serving.return_urls

Reviewer feedback from PR #5135. The existing `Serving.extra.return_img_urls`
switch (added in #2848, 2025-01) is not pipeline-specific and belongs on
the shared `AppConfig` schema alongside `visualize`. Renamed accordingly
(name `return_urls` reflects that it controls all binary fields, not just
images). Old `extra.return_img_urls` key is still honored at startup with
a DeprecationWarning so existing configs keep working.

Schema:
* `AppConfig.return_urls: Optional[bool] = None` (tri-state so an explicit
  `false` overrides a stale `extra.return_img_urls: true` during migration;
  coerced to `False` after legacy resolution).

Applies to basic serving (1 schema + 11 pipeline apps + 1 shared plumbing
file) and HPS (12 Triton model.py + restructure-pages init for VL).

Also fixes a pre-existing inconsistency: HPS PaddleOCR-VL
`/restructure-pages` previously hardcoded `file_storage=None,
return_urls=False` for both `build_pipeline_exports` calls, so DOCX
exports were forced to Base64 even when the rest of the service returned
URLs. Adds an `initialize()` override mirroring layout-parsing.

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>

---------

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
scyyh11 added 3 commits May 28, 2026 19:40
Add the reference-style URL-return note (pointing at the serving deployment doc's "以 URL 形式返回响应文件" section) to the serving response schemas of OCR, formula_recognition, seal_recognition, table_recognition, table_recognition_v2, layout_parsing, PP-DocTranslation, and PP-ChatOCRv3/v4 doc, matching PP-StructureV3 and PaddleOCR-VL. Binary fields that fall back to Base64 are annotated as becoming pre-signed URLs when URL-return mode is on.

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
…urn-markdown-images

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>

# Conflicts:
#	deploy/hps/sdk/pipelines/PP-StructureV3/server/model_repo/layout-parsing/1/model.py
#	deploy/hps/sdk/pipelines/PaddleOCR-VL/server/model_repo/layout-parsing/1/model.py
#	paddlex/inference/serving/basic_serving/_pipeline_apps/paddleocr_vl.py
#	paddlex/inference/serving/basic_serving/_pipeline_apps/pp_structurev3.py
PR PaddlePaddle#5143 promoted the URL-return switch from Serving.extra.return_img_urls
to a top-level Serving.return_urls field (the legacy key is still honored
at startup with a DeprecationWarning). Update the 'Returning Response Files
as URLs' section in both language docs to teach the new top-level field,
keep file_storage/url_expires_in under Serving.extra, and note the legacy
key as deprecated.

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
@scyyh11 scyyh11 requested a review from Bobholamovic May 29, 2026 07:09
scyyh11 added 2 commits May 29, 2026 01:20
Per review r3308340966: the URL-return mechanism is designed to be
extensible (only object storage / BOS today, but service-local file
serving may return URLs later), so the intro should just say the fields
can be returned as URLs, without baking in 'object storage / pre-signed
URL'. Concrete BOS config stays in the config block and notes below.

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Per review r3308333339: the section covers returning binary response
content (images and files such as exports) as URLs, so rename the title
from '响应文件 / Response Files' to '二进制内容 / Binary Content' and update
all cross-references (and the VL config subsection) to match.

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>

基础服务化与高稳定性服务化默认以 Base64 编码内联返回响应中的图像与文件字段,例如 `outputImages`、`inputImage`、`markdown.images`、`exports`(docx 等)。当响应中包含较大图像或多页 PDF 时,Base64 会显著增加响应体积。可配置服务改为以 URL 形式返回:响应中相应字段的值变为可下载的 URL,而非内联 Base64。

> 该开关为顶层字段 `Serving.return_urls`,控制响应中所有 Base64 内联文件字段(图像以及 `exports` 等导出文件),不仅是图像。旧配置项 `Serving.extra.return_img_urls` 仍被兼容(启动时给出弃用告警),新配置请改用 `Serving.return_urls`。
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为我们实际上之前并没有就这部分加文档,我建议可以不添加这一段

</tr>
</tbody>
</table>
<p>下表中涉及图像或文件的字段(如 <code>outputImages</code>、<code>inputImage</code>、<code>markdown.images</code>、<code>exports</code>)默认以 Base64 字符串内联返回;当服务端开启 URL 返回模式时,相应字段的值变为预签名 URL,字段类型保持不变。配置方式参见 <a href="../../../pipeline_deploy/serving.md">服务化部署</a>「以 URL 形式返回二进制内容」一节。</p>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

图像也属于文件,所以这里的称呼可能不太合适。建议可以考虑一个更通用的称呼,例如“图像等二进制文件”

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants