Skip to content

extract_image returns an extension "flate" instead of "png" #2348

Closed
@cbm755

Description

@cbm755

In Plom, we have a unit test that .jpeg and .png can be placed on a page and subsequently extracted with extract_image. In 1.21.1 to 1.22.0, the jpeg test works but the png one broke. The return dict from extract_image has value "flate" instead of "png" for the "ext" key.

Here is an MWE:

from pathlib import Path

import fitz
from PIL import Image, ImageDraw

def make_jpeg(dur):
    """page-size jpeg image."""
    f = Path(dur) / "jpg_file.jpg"
    img = Image.new("RGB", (900, 1500), color=(73, 109, 130))
    d = ImageDraw.Draw(img)
    d.text((10, 10), "some text", fill=(255, 255, 0))
    img.save(f)
    return (f, img)


def make_png(dur):
    """page-size png image."""
    f = Path(dur) / "png_file.png"
    img = Image.new("RGB", (900, 1500), color=(108, 72, 130))
    d = ImageDraw.Draw(img)
    d.text((10, 10), "some text", fill=(255, 255, 0))
    img.save(f)
    return (f, img)


tmp_path = Path(".")

jpg_file, jpg_img = make_jpeg(tmp_path)
png_file, png_img = make_png(tmp_path)

f = tmp_path / "doc.pdf"
d = fitz.open()
p = d.new_page(width=500, height=842)
rect = fitz.Rect(20, 20, 480, 820)
p.insert_image(rect, filename=jpg_file)
p = d.new_page(width=500, height=842)
p.insert_image(rect, filename=png_file)
d.ez_save(f)
d.close()

doc = fitz.open(f)
page = doc[0]
imlist = page.get_images()
d = doc.extract_image(imlist[0][0])
print(d["ext"])

page = doc[1]
imlist = page.get_images()
d = doc.extract_image(imlist[0][0])
print(d["ext"])

On PyMuPDF 1.21.1 the output is:

jpeg
png

On PyMuPDF 1.22.0, the output is:

jpeg
flate

Is this intended change or a regression?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions