-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Bug
When processing a PDF that contains a table spanning multiple pages, Docling does not merge the continued table into a single logical table.
If a table starts on one page and continues on the next page, it is split into multiple separate tables instead of being recognized as one continuous table.
This issue occurs in both:
- VLM pipeline
- Standard pipeline
In some cases, a single logical table is split into 2 separate tables.
Steps to reproduce
-
Use a PDF that contains a table spanning across two pages.
- Example: Page 1 contains table header + rows.
- Page 2 contains the continuation of the same table (remaining rows).
-
Run Docling using either:
- VLM pipeline, or
- Standard pipeline.
-
Inspect the extracted tables in the output.
Observed result:
The table is split into multiple independent tables.
Expected result:
The full table (including rows from both pages) should be merged into a single structured table.
Docling version
2.68.0
Python version
3.12.11
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request