Skip to content

Ligature issue when converting PDF to text #1351

@gargarvin

Description

@gargarvin

I am having a ligature issue with this PDF.
'fi', 'fl' and 'ff' characters are returning NULL

#598 is similar to this issue.

MVCE: Code + PDF

from PyPDF2 import PdfReader

reader = PdfReader("Inspection_redacted.pdf")
for page in reader.pages:
    print(page.extract_text())

PDF

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedWe appreciate help everywhere - this one might be an easy start!is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflow

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions