Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image in pdf changes color after applying redactions #2093

Closed
ot-ksrinivasan opened this issue Nov 30, 2022 · 8 comments
Closed

Image in pdf changes color after applying redactions #2093

ot-ksrinivasan opened this issue Nov 30, 2022 · 8 comments
Labels
upstream bug bug outside this package

Comments

@ot-ksrinivasan
Copy link

ot-ksrinivasan commented Nov 30, 2022

Description

Image in a PDF file changes color after applying redactions.

To Reproduce

Execute the following python script to reproduce the issue. The script uses this pdf file image_issue.pdf .

import os
import fitz

script_path = os.path.abspath(__file__)
script_folder = os.path.dirname(script_path)
doc = fitz.open(os.path.join(script_folder, 'image_issue.pdf'))

page = doc.load_page(0)

rx=135.123
ry=123.56878
rw=69.8409
rh=9.46397

x0 = rx
y0 = ry
x1 = rx + rw
y1 = ry + rh
    
rect = fitz.Rect(x0, y0, x1, y1)

font = fitz.Font("Helvetica")
fill_color=(0,0,0)
page.add_redact_annot(
    quad=rect,
    #text="null",
    fontname=font.name,
    fontsize=12,
    align=fitz.TEXT_ALIGN_CENTER,
    fill=fill_color,
    text_color=(1,1,1),
)

page.apply_redactions()

doc.save(os.path.join(script_folder, 'image_issue_redacted.pdf'))

Note that I am using the default images=2 (blank out overlapping image parts) when calling apply_redactions(). Using images= 0 (ignore) or images=1(remove complete overlapping image) are not desirable for my use case.

Expected behavior

The color of the image in the pdf file should not change after applying redactions.

Screenshots

Here's a screenshot of the problem.
image

Your configuration

  • Operating system Ubuntu 22.04.1 LTS
  • Python version 3.8.14
  • PyMuPDF version 1.20.2
@JorjMcKie
Copy link
Collaborator

The problem here is that your page is fully covered by two images.
So if your redaction rectangles intersect image bboxes, the respective images will always be changed if you let the .apply_redactions() parameter default to images=fitz.PDF_REDACT_IMAGE_PIXELS.
Specify page.apply_redactions(images=fitz.PDF_REDACT_IMAGE_NONE) and you are fine.

@JorjMcKie JorjMcKie added the not a bug not a bug / user error / unable to reproduce label Nov 30, 2022
@ot-ksrinivasan
Copy link
Author

The problem here is that your page is fully covered by two images. So if your redaction rectangles intersect image bboxes, the respective images will always be changed if you let the .apply_redactions() parameter default to images=fitz.PDF_REDACT_IMAGE_PIXELS. Specify page.apply_redactions(images=fitz.PDF_REDACT_IMAGE_NONE) and you are fine.

@JorjMcKie , Thank you for your response. As I mentioned in the description,
Note that I am using the default images=2 (blank out overlapping image parts) when calling apply_redactions(). Using images= 0 (ignore) or images=1(remove complete overlapping image) are not desirable for my use case.

As I understand, if we set images=0 page.apply_redactions(images=fitz.PDF_REDACT_IMAGE_NONE) then the image pixels that need to be redacted won't be removed and that is not an effective redaction right? if the box is removed, we will still see the text on the image right?

@JorjMcKie
Copy link
Collaborator

Got you now.
Situation was blurred by the fact that your redact rectangle was inside a black drawing and hence remained invisible whatsoever.
This is an upstream error (MuPDF): the 2 large images have transparency masks that are incorrectly handled, leading to white background where it has been black before.
I will try to make a reproducer program and submit an issue in MuPDF's bug system ... using example file: ok with you?

@JorjMcKie JorjMcKie reopened this Nov 30, 2022
@JorjMcKie JorjMcKie added upstream bug bug outside this package and removed not a bug not a bug / user error / unable to reproduce labels Nov 30, 2022
@ot-ksrinivasan
Copy link
Author

Got you now. Situation was blurred by the fact that your redact rectangle was inside a black drawing and hence remained invisible whatsoever. This is an upstream error (MuPDF): the 2 large images have transparency masks that are incorrectly handled, leading to white background where it has been black before. I will try to make a reproducer program and submit an issue in MuPDF's bug system ... using example file: ok with you?

Sure. You can use my pdf file to file a bug with mupdf.

@JorjMcKie
Copy link
Collaborator

@ot-ksrinivasan - May I use your example file and submit it to MuPDF's bug system?

@ot-ksrinivasan
Copy link
Author

@JorjMcKie Sure

@JorjMcKie
Copy link
Collaborator

Recorded under this issue number.

@julian-smith-artifex-com
Copy link
Collaborator

The underlying MuPDF bug has been fixed: https://bugs.ghostscript.com/show_bug.cgi?id=706114

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream bug bug outside this package
Projects
None yet
Development

No branches or pull requests

3 participants