Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract_image returns an extension "flate" instead of "png" #2348

Closed
cbm755 opened this issue Apr 15, 2023 · 2 comments
Closed

extract_image returns an extension "flate" instead of "png" #2348

cbm755 opened this issue Apr 15, 2023 · 2 comments
Labels

Comments

@cbm755
Copy link
Contributor

cbm755 commented Apr 15, 2023

In Plom, we have a unit test that .jpeg and .png can be placed on a page and subsequently extracted with extract_image. In 1.21.1 to 1.22.0, the jpeg test works but the png one broke. The return dict from extract_image has value "flate" instead of "png" for the "ext" key.

Here is an MWE:

from pathlib import Path

import fitz
from PIL import Image, ImageDraw

def make_jpeg(dur):
    """page-size jpeg image."""
    f = Path(dur) / "jpg_file.jpg"
    img = Image.new("RGB", (900, 1500), color=(73, 109, 130))
    d = ImageDraw.Draw(img)
    d.text((10, 10), "some text", fill=(255, 255, 0))
    img.save(f)
    return (f, img)


def make_png(dur):
    """page-size png image."""
    f = Path(dur) / "png_file.png"
    img = Image.new("RGB", (900, 1500), color=(108, 72, 130))
    d = ImageDraw.Draw(img)
    d.text((10, 10), "some text", fill=(255, 255, 0))
    img.save(f)
    return (f, img)


tmp_path = Path(".")

jpg_file, jpg_img = make_jpeg(tmp_path)
png_file, png_img = make_png(tmp_path)

f = tmp_path / "doc.pdf"
d = fitz.open()
p = d.new_page(width=500, height=842)
rect = fitz.Rect(20, 20, 480, 820)
p.insert_image(rect, filename=jpg_file)
p = d.new_page(width=500, height=842)
p.insert_image(rect, filename=png_file)
d.ez_save(f)
d.close()

doc = fitz.open(f)
page = doc[0]
imlist = page.get_images()
d = doc.extract_image(imlist[0][0])
print(d["ext"])

page = doc[1]
imlist = page.get_images()
d = doc.extract_image(imlist[0][0])
print(d["ext"])

On PyMuPDF 1.21.1 the output is:

jpeg
png

On PyMuPDF 1.22.0, the output is:

jpeg
flate

Is this intended change or a regression?

@JorjMcKie JorjMcKie added the bug label Apr 16, 2023
@JorjMcKie
Copy link
Collaborator

JorjMcKie commented Apr 16, 2023

Confirmed. This is weird!
The PNG is stored with compression filter FlateDecode - so no surprise that extraction extension indeed is "flate".
Also tested the case of inserting a pixmap instead of a file via name: this also caused FlateDecode - which is a surprise.
Investigating further.

JorjMcKie added a commit that referenced this issue Apr 17, 2023
We were returning arbitrary image type codes when reading the image binary from the PDF, e.g. "flate", even though they never would correspond to a meaningful image file extension.
This fix catches type codes not corresponding to known image file extensions and cause such images to be converted to PNG images (via an intermediate Pixmap).
julian-smith-artifex-com added a commit to ArtifexSoftware/PyMuPDF-julian that referenced this issue Apr 18, 2023
julian-smith-artifex-com pushed a commit to ArtifexSoftware/PyMuPDF-julian that referenced this issue Apr 18, 2023
We were returning arbitrary image type codes when reading the image binary from the PDF, e.g. "flate", even though they never would correspond to a meaningful image file extension.
This fix catches type codes not corresponding to known image file extensions and cause such images to be converted to PNG images (via an intermediate Pixmap).
julian-smith-artifex-com pushed a commit that referenced this issue Apr 18, 2023
We were returning arbitrary image type codes when reading the image binary from the PDF, e.g. "flate", even though they never would correspond to a meaningful image file extension.
This fix catches type codes not corresponding to known image file extensions and cause such images to be converted to PNG images (via an intermediate Pixmap).
@julian-smith-artifex-com
Copy link
Collaborator

Fixed in PyMuPDF-1.22.1.

julian-smith-artifex-com added a commit to ArtifexSoftware/PyMuPDF-julian that referenced this issue Apr 25, 2023
julian-smith-artifex-com added a commit that referenced this issue Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants