Skip to content

Same table extracted twice from PDF in stream mode #538

Open
@stertingen

Description

Describe the bug

Camelot extracts the same table twice under some circumstances.
This happened in stream mode; camelot extracts the table only partially on the first try.

Steps to reproduce the bug

  1. Install camelot-py[base] with pip
  2. Download PDF file below
  3. Run script below

Expected behavior

I expected camelot to either extract exactly one table or multiple tables which do not overlap.

Code

#!/usr/bin/env python3

import camelot

tables = camelot.read_pdf("./Lijnfolder-dr-2024-regio-Arnhem.pdf", "8", flavor="stream")

for table in tables:
    camelot.plot.contour(table)

PDF

https://www.connexxion.nl/getmedia/c2bce2c6-ebfe-43a9-8154-0b6bec9244fd/Lijnfolder-dr-2024-regio-Arnhem.pdf

Screenshots

Image
Image

Environment

  • OS: Windows 11
  • Python version: 3.12.8
  • Numpy version: 2.0.2
  • OpenCV version: 4.11.0
  • Ghostscript version:
  • camelot version: 1.0.0

Additional context

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions