Skip to content

Conversation

@cau-git
Copy link
Member

@cau-git cau-git commented Feb 4, 2026

Summary

This PR adds per-call pipeline option overrides in DocumentConverter and enforces compatibility-based reuse of initialized pipelines.

What’s Included

  • Adds format_options override support to:
    • DocumentConverter.convert(...)
    • DocumentConverter.convert_all(...)
  • Adds compatibility checks for override options:
    • same options class
    • identical non-do_* fields
    • do_* flags can only be relaxed (True -> False)
  • Applies compatible overrides at execution time without creating a new pipeline instance.
  • Extends pipeline option models with compatibility/runtime-toggle payload helpers.
  • Updates enrichment/stage execution paths (including threaded PDF pipeline) so runtime do_* overrides are respected safely.
  • Adds focused tests for override compatibility behavior in tests/test_options.py.
  • Updates converter docstrings to match the enforced compatibility semantics.

Behavior

  • Compatible overrides are accepted and applied for the current execution.
  • Incompatible overrides fail conversion (raise or return failure depending on raises_on_error).

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
dolfim-ibm and others added 12 commits February 1, 2026 22:05
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
@cau-git cau-git changed the title Enable pipeline override and reuse with compatible options (WIP) feat: Enable pipeline override and reuse with compatible options (WIP) Feb 4, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

DCO Check Passed

Thanks @cau-git, all your commits are properly signed off. 🎉

@codecov
Copy link

codecov bot commented Feb 4, 2026

Codecov Report

❌ Patch coverage is 90.08264% with 12 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
docling/document_converter.py 74.19% 8 Missing ⚠️
docling/pipeline/base_pipeline.py 93.10% 2 Missing ⚠️
docling/datamodel/pipeline_options.py 95.65% 1 Missing ⚠️
docling/pipeline/standard_pdf_pipeline.py 97.36% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Base automatically changed from feat-model-runtimes to main February 4, 2026 16:29
@mergify
Copy link

mergify bot commented Feb 4, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
- remove `force_all_model_init`
- reject incompatible override options (no auto pipeline reinit)
- allow runtime `do_*` overrides only for `True -> False` toggles
- apply compatible `do_*` overrides per execution in base/threaded PDF pipelines
- add compatibility tests and update converter docstrings

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
@cau-git cau-git marked this pull request as ready for review February 10, 2026 12:35
@dosubot
Copy link

dosubot bot commented Feb 10, 2026

Related Documentation

1 document(s) may need updating based on files changed in this PR:

Docling

What are the detailed pipeline options and processing behaviors for PDF, DOCX, PPTX, and XLSX files in the Python SDK?
View Suggested Changes
@@ -8,9 +8,8 @@
     - `generate_page_images`, `generate_picture_images`: Extract page/picture images
     - `force_backend_text`: Force backend text extraction
     - Additional options for OCR engine, layout model, table extraction, etc.
-- **Notes**: Only PDF supports image resolution adjustment. For more details, see [pipeline options code](https://github.com/docling-project/docling/blob/ae4fdbbb09fd377bb271e9b2efe541873eeb2990/docling/datamodel/pipeline_options.py#L891-L1336) and [example](https://app.dosu.dev/documents/9640186d-61e1-4ca1-9d8a-b82b3ee6bff8).
-
----
+- **Pipeline Option Overrides**: The Python API allows you to override pipeline options at conversion time for a given format using the `format_options` argument. Only `do_*` flags (such as `do_ocr`, `do_table_structure`, `do_code_enrichment`, `do_formula_enrichment`, etc.) can be changed, and only from `True` to `False`. All other options must remain identical to those used at pipeline initialization. Attempting to enable a do_* flag or change other fields will result in an error. This enables per-call disabling of enrichment features without reinitializing the pipeline.
+- **Notes**: Only PDF supports image resolution adjustment. For more details, see [pipeline options code](https://github.com/docling-project/docling/blob/ae4fdbbb09fd377bb271e9b2efe541873eeb2990/docling/datamodel/pipeline_options.py#L891-L1336) and [example](https://app.dosu.dev/documents/9640186d-61e1-4ca1-9d8a-b82b3ee6bff8). Refer to the Python SDK documentation for usage of `format_options`.
 
 ### DOCX
 - **Pipeline/Backend**: `SimplePipeline` + `MsWordDocumentBackend`
@@ -52,5 +51,4 @@
 - Only PDF supports image resolution adjustment (`images_scale`).
 - DOCX header/footer export is only available via Python API.
 - PPTX/XLSX support enrichment options and pagination (slide/sheet level).
-
-For further details, refer to the provided code links and examples.
+- **Pipeline Option Overrides**: For all formats, the Python API supports disabling enrichment-related `do_*` flags at conversion time using the `format_options` argument. Only disabling (True → False) is allowed; all other options must remain unchanged. See the PDF section above for details.

[Accept] [Decline]

Note: You must be authenticated to accept/decline updates.

How did I do? Any feedback?  Join Discord

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
@cau-git cau-git changed the title feat: Enable pipeline override and reuse with compatible options (WIP) feat: Enable pipeline override and reuse with compatible options Feb 10, 2026
@cau-git cau-git requested a review from dolfim-ibm February 10, 2026 12:59
Comment on lines +228 to +252
def _get_enrichment_pipe_for_execution(
self,
) -> Iterable[GenericEnrichmentModel[Any]]:
effective_options = self.get_effective_options()
assert isinstance(effective_options, ConvertPipelineOptions)

do_picture_classification = (
effective_options.do_picture_classification
or effective_options.do_chart_extraction
)
do_picture_description = effective_options.do_picture_description
do_chart_extraction = effective_options.do_chart_extraction

for model in self.enrichment_pipe:
if isinstance(model, DocumentPictureClassifier):
if do_picture_classification:
yield model
elif isinstance(model, PictureDescriptionBaseModel):
if do_picture_description:
yield model
elif isinstance(model, ChartExtractionModelGraniteVision):
if do_chart_extraction:
yield model
else:
yield model
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the coolest thing to put here. Ideas for improvements are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants