Still unable to retrieve PDF-extracted image xres/yres #4486

wohali · 2025-05-04T05:41:56Z

wohali
May 4, 2025

Description of the bug

Continuing the discussion from #479, I'd expected this to work, but it still doesn't:

$ poetry run python
Python 3.11.2 (main, Nov 30 2024, 21:22:50) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import pymupdf
>>> pymupdf.__version__
'1.25.5'
>>> doc = pymupdf.open("i-9-paper-version.pdf")
>>> i = doc.extract_image(4)
>>> i['xres']
96
>>> i['yres']
96
>>> i['width']
201
>>> i['height']
199
>>> i['bpc']
8

vs.

$ pdfimages -list i-9-paper-version.pdf
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image     200   199  index   1   8  image  no        47  0   300   301 16.5K  42%
   1     1 image      71    71  index   1   8  image  no        48  0   300   304 1510B  30%
   1     2 image      71    71  index   1   8  image  no        48  0   300   304 1510B  30%
   2     3 image     201   199  index   1   8  image  no         4  0   300   301 16.3K  42%

At the moment, I am forced to shell out to pdfimages to retrieve the per-image xres/yres. Is this something possible in the future with pymupdf at all?

How to reproduce the bug

Run the code above with the attached PDF file.

i-9-paper-version.pdf

PyMuPDF version

1.25.5

Operating system

Linux

Python version

3.11

wohali · 2025-05-04T06:42:43Z

wohali
May 4, 2025
Author

FYI I attempted your recent workaround and got the same result, sadly:

>>> for page in doc:
        for img in [b for b in page.get_text("dict")["blocks"] if b["type"] == 1]:
            print(f"{img['xres']=}, {img['yres']=}")
            pix = pymupdf.Pixmap(img["image"])
            print(f"{pix.xres=}, {pix.yres=}")  # <== this is correct!
            print()

img['xres']=96, img['yres']=96
pix.xres=96, pix.yres=96

img['xres']=96, img['yres']=96
pix.xres=96, pix.yres=96

img['xres']=96, img['yres']=96
pix.xres=96, pix.yres=96

img['xres']=96, img['yres']=96
pix.xres=96, pix.yres=96

Reading the upstream bug implied the underlying library can't do much to help here, but, we should be able to do better in PyMuPDF and calculate the x/y ppi as used within the context of the PDF itself as poppler does.

poppler performs the calculation here. ctm is the image's transformation matrix, the same as PyMuPDF's transform Matrix. It renders it to a virtual 72dpi output display.

Using this information:

>>> import pymupdf
>>> from math import sqrt
>>> doc = pymupdf.open("i-9-paper-version.pdf")
>>> p1 = Document.load_page(1)
>>> for img in p1.get_image_info():
...   mat = img['transform']
...   width2 = sqrt(pow(mat[0], 2) + pow(mat[1], 2))
...   height2 = sqrt(pow(mat[2], 2) + pow(mat[3], 2))
...   xppi = round(abs(img['width'] * 72 / width2))
...   yppi = round(abs(img['height'] * 72 / height2))
...   print(f"image #{img['number']}, xppi: {xppi}, yppi: {yppi}")
...
image #1, xppi: 300, yppi: 301
image #35, xppi: 300, yppi: 304
image #36, xppi: 300, yppi: 304

This is the information I'm after.

I can carry this code in my application, but would you consider having PyMuPDF return this information when retrieving images from PDFs instead?

0 replies

JorjMcKie · 2025-05-04T11:35:55Z

JorjMcKie
May 4, 2025
Maintainer

During the discussions around this question (not even an issue), it became clear that the currently returned values do not report worthwhile information at all.

Only two aspects can potentially be of interest:

The resolution stored as metadata in the image itself.
The "resolution" resulting from the image's original dimension in comparison to the dimension of the boundary box on the page.

The first definition is an image property which thus remains constant across all potential places where this same image may be displayed in the PDF.

The second definition obviously is bbox-specific and could therefore be different for two bboxes showing the same image on the same page. It does not reflect an image property.
Please provide some background about where this may be of value.

0 replies

JorjMcKie · 2025-05-04T11:40:23Z

JorjMcKie
May 4, 2025
Maintainer

When I take your image at xref 48 and insert it in a new page like that:

import pymupdf

doc = pymupdf.open()
page = doc.new_page()
r1 = pymupdf.Rect(100, 100, 200, 200)
r2 = pymupdf.Rect(200, 200, 400, 400)
xref = page.insert_image(r1, filename="48.png")
page.insert_image(r2, xref=xref)
doc.ez_save("x.pdf")

Your suggested computation will deliver these values for the exact same image:

xres=51, yres=51
xres=26, yres=26

I'm pretty sure that immediately after such a potential change, we would see issues asking why this can happen for the same image 😏.

0 replies

JorjMcKie · 2025-05-04T13:56:02Z

JorjMcKie
May 4, 2025
Maintainer

I cannot say yet whether we will implement your suggestion. If we would however, there would be consequential documentation and potentially other changes:
We are reporting xres / yres values at multiple other places, for instance, in the Pixmap class and in method Document.extract_image. Here, we have no connections to page boundary boxes - they represent image properties exclusively.

0 replies

wohali · 2025-05-04T16:41:00Z

wohali
May 4, 2025
Author

Thanks for the serious consideration, @JorjMcKie .

My specific application, and those who use it, is the processing of output from document scanners. My process needs to extract images, perform processing of them, and place them into another PDF.

In this specific case, all pages consist of a single document with a trivial bbox (though sometimes with x/y offsets in mat[4] and [5].

As a result, the problem you mention would never arise in my application, though it is a special case of 1 page == 1 image.

However, not all scanned documents in my workflow are the same dimensions, so it is not possible to assume that everything is A4 / Letter / etc.

0 replies

JorjMcKie · 2025-05-05T08:34:23Z

JorjMcKie
May 5, 2025
Maintainer

Thanks for your most recent post. Not sure I'm 100% understanding though 😟.
Anyway, I think it's time to transfer this issue to the Discussions tab for further clarifications.

0 replies

JorjMcKie · 2025-05-05T09:30:22Z

JorjMcKie
May 5, 2025
Maintainer

The "transform" matrix in get_img_info() can deliver a lot more information than one might expect:

Its input is the standard image rectangle imrect = Rect(0, 0, 1, 1).
Its output can be multiple:
- imrect * matrix = img["bbox"] - the boundary box on the page. No surprise here.
- imrect.tl * matrix is the Point where the top-left point (0, 0) of the image rectangle lands inside img["bbox"].
  In the general case, this is not the top-left point of img["bbox"], because the image may be stored tilted and rotated.
- Multiplying each of the four imrect corners separately therefore yields a Quad. Here is such an example showing the bbox in blue and the quad corners in red:
- These markings have been produced by the snippet
```
imrect = pymupdf.Rect(0, 0, 1, 1)  # the standard image rectangle
img = page.get_image_info()[0]
mat = img["transform"]
bbox = pymupdf.Rect(img["bbox"])  # same as imrect * mat
page.draw_rect(bbox, color=(0, 0, 1))  # draw bbox in blue
for p in imrect.quad:  # iterate the corners of imrect as a quad
    # draw each corner of image quad in the bbox
    page.draw_circle(p * mat, 2, color=(1, 0, 0))
```

0 replies

JorjMcKie · 2025-05-05T10:12:07Z

JorjMcKie
May 5, 2025
Maintainer

Applying your suggested revised formula for resolution computation in this example delivers xres=164, yres=202.

The original image dimensions are width = 439, height = 501.

The distance between the top-left and the top-right points in the rotated / tilted image is 193. This is easy to compute: dist = abs(imrect.tl * mat - imrect.tr * mat) = 192.92....
This now reveals the idea behind your computation. It is nothing else but:

Original width divided by displayed width, multiplied by 72 (standard dpi): 439 * 72 / 193 = ~164.
Same is true for the height.

3 replies

JorjMcKie May 5, 2025
Maintainer

In the above terminology, the terms displayed "width" / "height" are used in a fairly liberal sense: what they really mean are the length of the sides of the rotated parallelogram that is produced in the general case. These values in general are not the bbox width / height.

wohali May 6, 2025
Author

Sorry, business priorities have come up and I've had to task switch to something else for the moment. I hope to give you the information you need to better understand my concerns in the next few days.

JorjMcKie May 6, 2025
Maintainer

No worries, take your time.

Still unable to retrieve PDF-extracted image xres/yres #4486

Uh oh!

wohali May 4, 2025

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

Replies: 8 comments · 3 replies

Uh oh!

Uh oh!

wohali May 4, 2025 Author

Uh oh!

JorjMcKie May 4, 2025 Maintainer

Uh oh!

Uh oh!

JorjMcKie May 4, 2025 Maintainer

Uh oh!

JorjMcKie May 4, 2025 Maintainer

Uh oh!

Uh oh!

wohali May 4, 2025 Author

Uh oh!

JorjMcKie May 5, 2025 Maintainer

Uh oh!

Uh oh!

JorjMcKie May 5, 2025 Maintainer

Uh oh!

JorjMcKie May 5, 2025 Maintainer

Uh oh!

JorjMcKie May 5, 2025 Maintainer

Uh oh!

wohali May 6, 2025 Author

Uh oh!

JorjMcKie May 6, 2025 Maintainer

wohali
May 4, 2025

Replies: 8 comments 3 replies

wohali
May 4, 2025
Author

JorjMcKie
May 4, 2025
Maintainer

JorjMcKie
May 4, 2025
Maintainer

JorjMcKie
May 4, 2025
Maintainer

wohali
May 4, 2025
Author

JorjMcKie
May 5, 2025
Maintainer

JorjMcKie
May 5, 2025
Maintainer

JorjMcKie
May 5, 2025
Maintainer

JorjMcKie May 5, 2025
Maintainer

wohali May 6, 2025
Author

JorjMcKie May 6, 2025
Maintainer