Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyPDF fails #59

Open
mauro1855 opened this issue Jan 25, 2017 · 0 comments
Open

PyPDF fails #59

mauro1855 opened this issue Jan 25, 2017 · 0 comments

Comments

@mauro1855
Copy link

On some PDF files, I get the following:

PdfReadWarning: Invalid stream (index 0) within object 14 0: Stream has ended unexpectedly [pdf.py:1573]
Traceback (most recent call last):
  File "/usr/bin/pypdfocr", line 11, in <module>
    load_entry_point('pypdfocr==0.9.1', 'console_scripts', 'pypdfocr')()
  File "/usr/lib/python2.7/site-packages/pypdfocr/pypdfocr.py", line 492, in main
    script.go(sys.argv[1:])
  File "/usr/lib/python2.7/site-packages/pypdfocr/pypdfocr.py", line 474, in go
    self._convert_and_file_email(self.pdf_filename)
  File "/usr/lib/python2.7/site-packages/pypdfocr/pypdfocr.py", line 480, in _convert_and_file_email
    ocr_pdffilename = self.run_conversion(pdf_filename)
  File "/usr/lib/python2.7/site-packages/pypdfocr/pypdfocr.py", line 363, in run_conversion
    ocr_pdf_filename = self.pdf.overlay_hocr_pages(img_dpi, hocr_filenames, pdf_filename)
  File "/usr/lib/python2.7/site-packages/pypdfocr/pypdfocr_pdf.py", line 171, in overlay_hocr_pages
    writer.write(f)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 586, in _sweepIndirectReferences
    newobj = self._sweepIndirectReferences(externMap, newobj)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 577, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 1593, in getObject
    retval = self._getObjectFromStream(indirectReference)
  File "/usr/lib/python2.7/site-packages/PyPDF2/pdf.py", line 1576, in _getObjectFromStream
    raise utils.PdfReadError("Can't read object stream: %s"%e)
PyPDF2.utils.PdfReadError: Can't read object stream: Stream has ended unexpectedly

I'm using CentOs 7.2.1511.

The same file is OCRed correctly in Windows.
I found a bug regarding this issue with pyPDF: py-pdf/pypdf#99

MartinThoma added a commit to py-pdf/pypdf that referenced this issue Apr 23, 2022
This doesn't solve the issue, but it might make it
less severe.

See #520
See #268
See virantha/pypdfocr#59

Co-authored-by: danniesim <geemee@gmail.com>
MartinThoma added a commit to py-pdf/pypdf that referenced this issue Apr 23, 2022
This doesn't solve the issue, but it might make it less severe.

See #520
See #268
See virantha/pypdfocr#59

sfneal@3558a69

Co-authored-by: danniesim <geemee@gmail.com>
dsk7 pushed a commit to montaggroup/PyPDF2 that referenced this issue Apr 23, 2022
This doesn't solve the issue, but it might make it less severe.

See py-pdf#520
See py-pdf#268
See virantha/pypdfocr#59

sfneal@3558a69

Co-authored-by: danniesim <geemee@gmail.com>
VictorCarlquist pushed a commit to VictorCarlquist/PyPDF2 that referenced this issue Apr 29, 2022
This doesn't solve the issue, but it might make it less severe.

See py-pdf#520
See py-pdf#268
See virantha/pypdfocr#59

sfneal@3558a69

Co-authored-by: danniesim <geemee@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant