Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image extraction bugs with newer versions #2369

Closed
anomam opened this issue Apr 25, 2023 · 3 comments
Closed

Image extraction bugs with newer versions #2369

anomam opened this issue Apr 25, 2023 · 3 comments

Comments

@anomam
Copy link

anomam commented Apr 25, 2023

Describe the bug (mandatory)

In the newer versions 1.22.0 and 1.22.1, it looks like certain image formats like pam are not handled properly anymore

To Reproduce (mandatory)

Running the following leads to different results with versions 1.19.6 and 1.22.0

import fitz


width, height = 13, 37
image = fitz.Pixmap(fitz.csGRAY, width, height, b"\x00" * (width * height), False)

with fitz.Document(stream=image.tobytes(output="pam"), filetype="pam") as doc:
    test_pdf_bytes = doc.convert_to_pdf()

with fitz.Document(stream=test_pdf_bytes) as doc:
    page = doc[0]
    img_xref = page.get_images()[0][0]
    img_bytes = doc.extract_image(img_xref)["image"]
    print(img_bytes)
    fitz.Pixmap(img_bytes)

With 1.19.6, this runs without error and prints

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\r\x00\x00\x00%\x08\x00\x00\x00\x00\xbc\x7f<\xfb\x00\x00\x00\tpHYs\x00\x00\x0e\xc4\x00\x00\x0e\xc4\x01\x95+\x0e\x1b\x00\x00\x00\x11IDATx\x9ccd@\x06\x8c\xa3\xbc\x11\xc9\x03\x00(x\x00&i\xc7\xc3\xfb\x00\x00\x00\x00IEND\xaeB`\x82'

With 1.22.0, the last line raises an error

in Pixmap.__init__(self, *args)
   7136 def __init__(self, *args):
   7137     """Pixmap(colorspace, irect, alpha) - empty pixmap.
   7138     Pixmap(colorspace, src) - copy changing colorspace.
   7139     Pixmap(src, width, height,[clip]) - scaled copy, float dimensions.
   (...)
   7145     Pixmap(PDFdoc, xref) - from an image xref in a PDF document.
   7146     """
-> 7148     _fitz.Pixmap_swiginit(self, _fitz.new_Pixmap(*args))

RuntimeError: unknown image file format

and the bytes printed are very different

b'&\xa0\x9f\xff\xff\xff\xff\xff\xff\xff\xff\xe0\x02\x00 '

Your configuration (mandatory)

  • Operating system, potentially version and bitness
  • Python version, bitness
  • PyMuPDF version, installation method (wheel or generated from source).

For example, the output of print(sys.version, "\n", sys.platform, "\n", fitz.__doc__) would be sufficient (for the first two bullets).

> print(sys.version, "\n", sys.platform, "\n", fitz.__doc__)
3.9.13 (main, Sep  8 2022, 09:21:48)
[GCC 9.4.0]
 linux

PyMuPDF 1.22.0: Python bindings for the MuPDF 1.22.0 library.
Version date: 2023-04-14 00:00:01.
Built for Python 3.9 on linux (64-bit).

Installed via pip install pymupdf==1.22.0

@julian-smith-artifex-com
Copy link
Collaborator

Re-opening because fix is not in a release yet.

@julian-smith-artifex-com
Copy link
Collaborator

Fixed in new release 1.22.2.

@anomam
Copy link
Author

anomam commented Apr 27, 2023

Thank you for the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants