Skip to content

PyPDF fails #59

Open
Open
@mauro1855

Description

@mauro1855

On some PDF files, I get the following:

PdfReadWarning: Invalid stream (index 0) within object 14 0: Stream has ended unexpectedly [pdf.py:1573]
Traceback (most recent call last):
  File "/usr/bin/pypdfocr", line 11, in <module>
    load_entry_point('pypdfocr==0.9.1', 'console_scripts', 'pypdfocr')()
  File "/usr/lib/python2.7/site-packages/pypdfocr/pypdfocr.py", line 492, in main
    script.go(sys.argv[1:])
  File "/usr/lib/python2.7/site-packages/pypdfocr/pypdfocr.py", line 474, in go
    self._convert_and_file_email(self.pdf_filename)
  File "/usr/lib/python2.7/site-packages/pypdfocr/pypdfocr.py", line 480, in _convert_and_file_email
    ocr_pdffilename = self.run_conversion(pdf_filename)
  File "/usr/lib/python2.7/site-packages/pypdfocr/pypdfocr.py", line 363, in run_conversion
    ocr_pdf_filename = self.pdf.overlay_hocr_pages(img_dpi, hocr_filenames, pdf_filename)
  File "/usr/lib/python2.7/site-packages/pypdfocr/pypdfocr_pdf.py", line 171, in overlay_hocr_pages
    writer.write(f)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 586, in _sweepIndirectReferences
    newobj = self._sweepIndirectReferences(externMap, newobj)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 577, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 1593, in getObject
    retval = self._getObjectFromStream(indirectReference)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 1576, in _getObjectFromStream
    raise utils.PdfReadError("Can't read object stream: %s"%e)
PyPDF2.utils.PdfReadError: Can't read object stream: Stream has ended unexpectedly

I'm using CentOs 7.2.1511.

The same file is OCRed correctly in Windows.
I found a bug regarding this issue with pyPDF: py-pdf/pypdf#99

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions