Description
Description of the bug
I have a PDF with images using a SMask for transparency, but when I call doc.get_page_images()
or similar functions, the smask
has xref 0
. I believe this happen due to the way SMask is used, it appears it uses a object with a object with draws operations for the mask, and the mask has the size of the entire Page, while the image itself is only 256x256.
The PDF has being created by Inskcape.
How to reproduce the bug
Run the following script:
import fitz
doc = fitz.open('drawing-uncompressed.pdf')
for image in doc.get_page_images(0, full=True):
print(image)
xref = image[0]
pix1 = fitz.Pixmap(doc.extract_image(xref)["image"])
pix1.save("image.png")
xref = 14
smask = doc.extract_image(xref)["smask"]
pix1 = fitz.Pixmap(doc.extract_image(xref)["image"])
mask = fitz.Pixmap(doc.extract_image(smask)["image"])
pix = fitz.Pixmap(pix1, mask)
pix.save("mask.png")
With the following file:
(compressed (before decompressing with qpdf): drawing.pdf; the Inkscape file that generated it: drawing.svg)
The script prints a single image, with smask 0. Looking at the PDF code, the smask of the image should be 10, which refers to 12, which draws 14, which is the image I would expect to be the mask, although it itself has a smask 16.
PyMuPDF version
1.24.5
Operating system
Windows
Python version
3.11