Image extraction bugs with newer versions #2369

anomam · 2023-04-25T02:14:06Z

Describe the bug (mandatory)

In the newer versions 1.22.0 and 1.22.1, it looks like certain image formats like pam are not handled properly anymore

To Reproduce (mandatory)

Running the following leads to different results with versions 1.19.6 and 1.22.0

import fitz


width, height = 13, 37
image = fitz.Pixmap(fitz.csGRAY, width, height, b"\x00" * (width * height), False)

with fitz.Document(stream=image.tobytes(output="pam"), filetype="pam") as doc:
    test_pdf_bytes = doc.convert_to_pdf()

with fitz.Document(stream=test_pdf_bytes) as doc:
    page = doc[0]
    img_xref = page.get_images()[0][0]
    img_bytes = doc.extract_image(img_xref)["image"]
    print(img_bytes)
    fitz.Pixmap(img_bytes)

With 1.19.6, this runs without error and prints

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\r\x00\x00\x00%\x08\x00\x00\x00\x00\xbc\x7f<\xfb\x00\x00\x00\tpHYs\x00\x00\x0e\xc4\x00\x00\x0e\xc4\x01\x95+\x0e\x1b\x00\x00\x00\x11IDATx\x9ccd@\x06\x8c\xa3\xbc\x11\xc9\x03\x00(x\x00&i\xc7\xc3\xfb\x00\x00\x00\x00IEND\xaeB`\x82'

With 1.22.0, the last line raises an error

in Pixmap.__init__(self, *args)
   7136 def __init__(self, *args):
   7137     """Pixmap(colorspace, irect, alpha) - empty pixmap.
   7138     Pixmap(colorspace, src) - copy changing colorspace.
   7139     Pixmap(src, width, height,[clip]) - scaled copy, float dimensions.
   (...)
   7145     Pixmap(PDFdoc, xref) - from an image xref in a PDF document.
   7146     """
-> 7148     _fitz.Pixmap_swiginit(self, _fitz.new_Pixmap(*args))

RuntimeError: unknown image file format

and the bytes printed are very different

b'&\xa0\x9f\xff\xff\xff\xff\xff\xff\xff\xff\xe0\x02\x00 '

Your configuration (mandatory)

Operating system, potentially version and bitness
Python version, bitness
PyMuPDF version, installation method (wheel or generated from source).

For example, the output of print(sys.version, "\n", sys.platform, "\n", fitz.__doc__) would be sufficient (for the first two bullets).

> print(sys.version, "\n", sys.platform, "\n", fitz.__doc__)
3.9.13 (main, Sep  8 2022, 09:21:48)
[GCC 9.4.0]
 linux

PyMuPDF 1.22.0: Python bindings for the MuPDF 1.22.0 library.
Version date: 2023-04-14 00:00:01.
Built for Python 3.9 on linux (64-bit).

Installed via pip install pymupdf==1.22.0

The text was updated successfully, but these errors were encountered:

This extends commit 3bae451, which fixed pymupdf#2348.

julian-smith-artifex-com · 2023-04-25T13:12:08Z

Re-opening because fix is not in a release yet.

julian-smith-artifex-com · 2023-04-26T11:57:13Z

Fixed in new release 1.22.2.

anomam · 2023-04-27T01:50:57Z

Thank you for the quick fix!

julian-smith-artifex-com added a commit to ArtifexSoftware/PyMuPDF-julian that referenced this issue Apr 25, 2023

tests/test_pixmap.py: added test_2369() for pymupdf#2369.

031e424

julian-smith-artifex-com added a commit to ArtifexSoftware/PyMuPDF-julian that referenced this issue Apr 25, 2023

fitz/fitz.i: fix pymupdf#2369 - need to also handle FZ_IMAGE_FAX.

6606916

This extends commit 3bae451, which fixed pymupdf#2348.

julian-smith-artifex-com added a commit that referenced this issue Apr 25, 2023

tests/test_pixmap.py: added test_2369() for #2369.

e6da78c

julian-smith-artifex-com closed this as completed in 3bd8714 Apr 25, 2023

julian-smith-artifex-com added the Fixed in next release label Apr 25, 2023

julian-smith-artifex-com reopened this Apr 25, 2023

julian-smith-artifex-com removed the Fixed in next release label Apr 26, 2023

julian-smith-artifex-com closed this as completed Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image extraction bugs with newer versions #2369

Image extraction bugs with newer versions #2369

anomam commented Apr 25, 2023

julian-smith-artifex-com commented Apr 25, 2023

julian-smith-artifex-com commented Apr 26, 2023

anomam commented Apr 27, 2023

Image extraction bugs with newer versions #2369

Image extraction bugs with newer versions #2369

Comments

anomam commented Apr 25, 2023

Describe the bug (mandatory)

To Reproduce (mandatory)

Your configuration (mandatory)

julian-smith-artifex-com commented Apr 25, 2023

julian-smith-artifex-com commented Apr 26, 2023

anomam commented Apr 27, 2023