Open
Description
On some PDF files, I get the following:
PdfReadWarning: Invalid stream (index 0) within object 14 0: Stream has ended unexpectedly [pdf.py:1573]
Traceback (most recent call last):
File "/usr/bin/pypdfocr", line 11, in <module>
load_entry_point('pypdfocr==0.9.1', 'console_scripts', 'pypdfocr')()
File "/usr/lib/python2.7/site-packages/pypdfocr/pypdfocr.py", line 492, in main
script.go(sys.argv[1:])
File "/usr/lib/python2.7/site-packages/pypdfocr/pypdfocr.py", line 474, in go
self._convert_and_file_email(self.pdf_filename)
File "/usr/lib/python2.7/site-packages/pypdfocr/pypdfocr.py", line 480, in _convert_and_file_email
ocr_pdffilename = self.run_conversion(pdf_filename)
File "/usr/lib/python2.7/site-packages/pypdfocr/pypdfocr.py", line 363, in run_conversion
ocr_pdf_filename = self.pdf.overlay_hocr_pages(img_dpi, hocr_filenames, pdf_filename)
File "/usr/lib/python2.7/site-packages/pypdfocr/pypdfocr_pdf.py", line 171, in overlay_hocr_pages
writer.write(f)
File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 482, in write
self._sweepIndirectReferences(externalReferenceMap, self._root)
File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 556, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, data[i])
File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 586, in _sweepIndirectReferences
newobj = self._sweepIndirectReferences(externMap, newobj)
File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 577, in _sweepIndirectReferences
newobj = data.pdf.getObject(data)
File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 1593, in getObject
retval = self._getObjectFromStream(indirectReference)
File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 1576, in _getObjectFromStream
raise utils.PdfReadError("Can't read object stream: %s"%e)
PyPDF2.utils.PdfReadError: Can't read object stream: Stream has ended unexpectedly
I'm using CentOs 7.2.1511.
The same file is OCRed correctly in Windows.
I found a bug regarding this issue with pyPDF: py-pdf/pypdf#99
Metadata
Metadata
Assignees
Labels
No labels