Skip to content

apply_redactions() does not work as expected #3863

Closed
@nsklei

Description

@nsklei

Description of the bug

When using apply_redactions(images=pymupdf.PDF_REDACT_IMAGE_NONE) I get several "MuPDF error: syntax error: cannot find XObject resource" errors and as well there are some pages which are completely empty, altough all pages originally contain images.

How to reproduce the bug

import pymupdf
from io import BytesIO
from pathlib import Path

file_path = "path\to\Example_PDF.pdf"
output_path = "path\to\Example_PDF_redacted.pdf"

new_doc = pymupdf.open(file_path)

for num, page in enumerate(new_doc):
    print(f"Page {num + 1} - {page.rect}:")
    
    for image in page.get_images(full=True):
        print(f"  - Image: {image}")

    redact_rect = page.rect

    if page.rotation in {90, 270}:
        redact_rect = pymupdf.Rect(0, 0, page.rect.height, page.rect.width)

    page.add_redact_annot(redact_rect)
    page.apply_redactions(images=pymupdf.PDF_REDACT_IMAGE_NONE)

byte_stream = BytesIO()
new_doc.save(byte_stream)
byte_stream.seek(0)

Path(output_path).write_bytes(byte_stream.getvalue())

The code above prints the following information:

Page 1 - Rect(0.0, 0.0, 598.3200073242188, 813.5999755859375):
  - Image: (22, 0, 554, 754, 8, 'ICCBased', '', 'Im0', 'DCTDecode', 0)
  - Image: (23, 43, 554, 754, 8, 'ICCBased', '', 'Im1', 'DCTDecode', 0)
Page 2 - Rect(0.0, 0.0, 598.3200073242188, 816.47998046875):
  - Image: (25, 0, 554, 756, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
  - Image: (26, 44, 554, 756, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
Page 3 - Rect(0.0, 0.0, 815.760009765625, 596.8800048828125):
  - Image: (28, 0, 553, 756, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
  - Image: (29, 45, 553, 756, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
Page 4 - Rect(0.0, 0.0, 815.760009765625, 597.5999755859375):
  - Image: (31, 0, 554, 756, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
  - Image: (32, 46, 554, 756, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
Page 5 - Rect(0.0, 0.0, 815.0399780273438, 597.5999755859375):
  - Image: (34, 0, 554, 755, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
  - Image: (35, 47, 554, 755, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
Page 6 - Rect(0.0, 0.0, 806.4000244140625, 598.3200073242188):
  - Image: (37, 0, 554, 747, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
  - Image: (38, 48, 554, 747, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
Page 7 - Rect(0.0, 0.0, 815.0399780273438, 597.5999755859375):
  - Image: (39, 0, 554, 755, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
  - Image: (40, 49, 554, 755, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
MuPDF error: syntax error: cannot find XObject resource 'Im1'

MuPDF error: syntax error: cannot find XObject resource 'Im2'

Page 8 - Rect(0.0, 0.0, 815.760009765625, 596.8800048828125):
  - Image: (41, 0, 553, 756, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
  - Image: (42, 50, 553, 756, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
MuPDF error: syntax error: cannot find XObject resource 'Im1'

MuPDF error: syntax error: cannot find XObject resource 'Im2'

As you can see, each page contains two images. The function should remove all content from the PDF file except the images.
But when saving the byte_stream there are some pages completely empy.

PyMuPDF version

1.24.10

Operating system

Windows

Python version

3.12

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions