Skip to content

Continued table across pages is split into multiple tables #2976

@bimalsagar-syook

Description

@bimalsagar-syook

Bug

When processing a PDF that contains a table spanning multiple pages, Docling does not merge the continued table into a single logical table.

If a table starts on one page and continues on the next page, it is split into multiple separate tables instead of being recognized as one continuous table.

This issue occurs in both:

  • VLM pipeline
  • Standard pipeline

In some cases, a single logical table is split into 2 separate tables.


Steps to reproduce

  1. Use a PDF that contains a table spanning across two pages.

    • Example: Page 1 contains table header + rows.
    • Page 2 contains the continuation of the same table (remaining rows).
  2. Run Docling using either:

    • VLM pipeline, or
    • Standard pipeline.
  3. Inspect the extracted tables in the output.

Observed result:
The table is split into multiple independent tables.

Expected result:
The full table (including rows from both pages) should be merged into a single structured table.


Docling version

2.68.0


Python version

3.12.11

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions