Skip to content

pypdf2 reads the hifenated words in a new line #246

Closed
@srivignes

Description

While parsing a pdf file using pypdf2, it reads the hifenated words like mm-dd-yy in a newline as :

mm
## 

dd
## 

yy

This is my code:

import PyPDF2 

def get_text(path):
    pdf = PyPDF2.PdfReader(file(path, "rb"))  
    content = ""
    content += pdf.pages[0].extract_text() + "\n"  
    return content

How can I overcome this and print them in the same line ?

Metadata

Assignees

No one assigned

    Labels

    workflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflow

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions