-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Labels
is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF
Description
I'm trying to read a pdf using PyPdf but it gave me this error, although my pdf file is not corrupted. but when i replace the version from 5.3.0 to 5.1.0. the error got resolved
PdfReadError: Unexpected end of stream
Environment
Ubuntu 20.0
Code + PDF
This is a minimal, complete example that shows the issue:
from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
DATA_PATH = 'data/'
def load_pdf_files(data):
loader=DirectoryLoader(data,glob='*.pdf',loader_cls=PyPDFLoader)
documnets=loader.load()
return documnets
documents=load_pdf_files(data=DATA_PATH)
print("length of documents",len(documents))
This is the pdf file I'm using
https://www.academia.edu/32752835/The_GALE_ENCYCLOPEDIA_of_MEDICINE_SECOND_EDITION
Metadata
Metadata
Assignees
Labels
is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF