Open
Description
Describe the bug
On certain PDFs, lines are returned in an unexpected order when use_text_flow
is set to True
.
Have you tried repairing the PDF?
Yes
Code to reproduce the problem
import pdfplumber
pdf_path = '/path/to/file.pdf'
with pdfplumber.open(pdf_path, repair=True) as pdf:
for page in pdf.pages:
lines = page.extract_text_lines(use_text_flow=True)
for line in lines:
print(line['text'])
PDF file
how-great-the-wisdom-and-the-love_bi.pdf
Expected behavior
Lines should be returned in this order:
- Stap tingbaot broken bodi blong Kraes,
Taem yumi brekem bred.
Dring wora long kap blong yumi witnes,
Yumi putum Kraes long fored. - Plan blong Papa God hem i komplit
Blong savem yumi long ol sin.
Hem i tekem Jastis, Lav mo Mersi
Blong mekem plan blong Salvesen.
Actual behavior
Lines are returned in this order:
- Stap tingbaot broken bodi blong Kraes,
Taem yumi brekem bred. - Plan blong Papa God hem i komplit
Blong savem yumi long ol sin.
Hem i tekem Jastis, Lav mo Mersi
Blong mekem plan blong Salvesen.
Dring wora long kap blong yumi witnes,
Yumi putum Kraes long fored.
Screenshots

Environment
pdfplumber version: 0.11.6
Python version: 3.12.8
OS: macOS 15.4 Sequoia