Closed
Description
Description of the bug
When using apply_redactions(images=pymupdf.PDF_REDACT_IMAGE_NONE) I get several "MuPDF error: syntax error: cannot find XObject resource" errors and as well there are some pages which are completely empty, altough all pages originally contain images.
How to reproduce the bug
import pymupdf
from io import BytesIO
from pathlib import Path
file_path = "path\to\Example_PDF.pdf"
output_path = "path\to\Example_PDF_redacted.pdf"
new_doc = pymupdf.open(file_path)
for num, page in enumerate(new_doc):
print(f"Page {num + 1} - {page.rect}:")
for image in page.get_images(full=True):
print(f" - Image: {image}")
redact_rect = page.rect
if page.rotation in {90, 270}:
redact_rect = pymupdf.Rect(0, 0, page.rect.height, page.rect.width)
page.add_redact_annot(redact_rect)
page.apply_redactions(images=pymupdf.PDF_REDACT_IMAGE_NONE)
byte_stream = BytesIO()
new_doc.save(byte_stream)
byte_stream.seek(0)
Path(output_path).write_bytes(byte_stream.getvalue())
The code above prints the following information:
Page 1 - Rect(0.0, 0.0, 598.3200073242188, 813.5999755859375):
- Image: (22, 0, 554, 754, 8, 'ICCBased', '', 'Im0', 'DCTDecode', 0)
- Image: (23, 43, 554, 754, 8, 'ICCBased', '', 'Im1', 'DCTDecode', 0)
Page 2 - Rect(0.0, 0.0, 598.3200073242188, 816.47998046875):
- Image: (25, 0, 554, 756, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
- Image: (26, 44, 554, 756, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
Page 3 - Rect(0.0, 0.0, 815.760009765625, 596.8800048828125):
- Image: (28, 0, 553, 756, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
- Image: (29, 45, 553, 756, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
Page 4 - Rect(0.0, 0.0, 815.760009765625, 597.5999755859375):
- Image: (31, 0, 554, 756, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
- Image: (32, 46, 554, 756, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
Page 5 - Rect(0.0, 0.0, 815.0399780273438, 597.5999755859375):
- Image: (34, 0, 554, 755, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
- Image: (35, 47, 554, 755, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
Page 6 - Rect(0.0, 0.0, 806.4000244140625, 598.3200073242188):
- Image: (37, 0, 554, 747, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
- Image: (38, 48, 554, 747, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
Page 7 - Rect(0.0, 0.0, 815.0399780273438, 597.5999755859375):
- Image: (39, 0, 554, 755, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
- Image: (40, 49, 554, 755, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
MuPDF error: syntax error: cannot find XObject resource 'Im1'
MuPDF error: syntax error: cannot find XObject resource 'Im2'
Page 8 - Rect(0.0, 0.0, 815.760009765625, 596.8800048828125):
- Image: (41, 0, 553, 756, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
- Image: (42, 50, 553, 756, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
MuPDF error: syntax error: cannot find XObject resource 'Im1'
MuPDF error: syntax error: cannot find XObject resource 'Im2'
As you can see, each page contains two images. The function should remove all content from the PDF file except the images.
But when saving the byte_stream there are some pages completely empy.
PyMuPDF version
1.24.10
Operating system
Windows
Python version
3.12