Skip to content

'not enough image data' exception from PIL #2343

@brianpow

Description

@brianpow

I am trying to extract images from pdf files, however occasionally it gives 'not enough image data' exception from PIL when handling certain pdf. The files look correct in Atril Document Viewer and works if using pdfimages from poppler-utils

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-6.5.0-kali3-amd64-x86_64-with-glibc2.37

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.17.2, crypt_provider=('cryptography', '38.0.4'), PIL=10.0.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader
import sys

for filename in sys.argv[1:]:
    reader = PdfReader(filename)
    for i, page in enumerate(reader.pages):
        for j, image in enumerate(page.images):
            print("Writing %d-%d: %s (%d)..." % (i, j, image.name, len(image.data)))            
            with open(image.name, "wb") as fp:
                fp.write(image.data)

Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!

test2_P038-038.pdf

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "/home/user/pypdf/pypdf_test.py", line 7, in <module>
    for j, image in enumerate(page.images):
  File "/home/user/.local/lib/python3.11/site-packages/pypdf/_page.py", line 2727, in __iter__
    yield self[i]
          ~~~~^^^
  File "/home/user/.local/lib/python3.11/site-packages/pypdf/_page.py", line 2723, in __getitem__
    return self.get_function(lst[index])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/pypdf/_page.py", line 557, in _get_image
    imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/pypdf/filters.py", line 785, in _xobj_to_image
    img, image_format, extension, _ = _handle_flate(
                                      ^^^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/pypdf/_xobj_image_helpers.py", line 172, in _handle_flate
    img = Image.frombytes(mode, size, data)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/PIL/Image.py", line 2952, in frombytes
    im.frombytes(data, decoder_name, args)
  File "/usr/lib/python3/dist-packages/PIL/Image.py", line 805, in frombytes
    raise ValueError(msg)
ValueError: not enough image data

Metadata

Metadata

Assignees

No one assigned

    Labels

    Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-imagesFrom a users perspective, image handling is the affected feature/workflow

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions