Skip to content

TST: Issue with merging pdfkit #2191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 15, 2023
Merged

Conversation

MartinThoma
Copy link
Member

No description provided.

@MartinThoma
Copy link
Member Author

@Lucas-C This one is pretty weird. With #2086 the merged PDF is now rotated by 180° and mirrored. I have no idea how that can happen.

@MartinThoma
Copy link
Member Author

Also, I'm very open to better similarity metrics. It is over 99% although there are very important differences (just the amount of pixels being different is low).

@codecov
Copy link

codecov bot commented Sep 13, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (fb35485) 94.34% compared to head (7d918f5) 94.34%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2191   +/-   ##
=======================================
  Coverage   94.34%   94.34%           
=======================================
  Files          43       43           
  Lines        7572     7572           
  Branches     1488     1488           
=======================================
  Hits         7144     7144           
  Misses        263      263           
  Partials      165      165           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stefan6419846
Copy link
Collaborator

stefan6419846 commented Sep 13, 2023

I can indeed confirm that this relates to the changes in #2086 which fixed the general watermarking behavior for me. As documented in #2113, page.transfer_rotation_to_content() before the watermarking fixes this, although I do not know why as page.rotation is 0.

@Lucas-C
Copy link
Member

Lucas-C commented Sep 13, 2023

@Lucas-C This one is pretty weird. With #2086 the merged PDF is now rotated by 180° and mirrored. I have no idea how that can happen.

Sorry, I miss some context there...
This PR pipeline seems to be passing.
What is the problem exactly?
How could I help?

@stefan6419846
Copy link
Collaborator

The PR pipeline is passing due to the xfail = expected failure decorator. The problem is that the watermark is inserted in rotated/mirrored/shifted way and not as you would expect. The relevant change apparently comes from #2086, which you are the author of.

@martin-thoma
Copy link

Thanks for explaining @stefan6419846

Just to clarify: I pinged you @Lucas-C because you might have an idea or be interested in the topic. No other reason :-)

I'm adding this xfail test (and later an issue) so that it is easier to work on this topic. I'll investigate the PDF generated by PDFkit. Maybe that one is weird. Or maybe the one created by reportlab.

@stefan6419846
Copy link
Collaborator

Just for the record as I have been testing this anyway: This is how it currently renders when running the test as-is: pdfkit_watermarked.png

@MartinThoma
Copy link
Member Author

When I run

from pypdf import PdfReader, PdfWriter
from pathlib import Path
import shutil
import subprocess

GHOSTSCRIPT_BINARY = shutil.which("gs")
SAMPLE_ROOT = Path(".")
RESOURCE_ROOT = Path("../resources")
base_path = SAMPLE_ROOT / "022-pdfkit/pdfkit.pdf"
watermark_path = SAMPLE_ROOT / "013-reportlab-overlay/reportlab-overlay.pdf"

reader = PdfReader(base_path)
base_page = reader.pages[0]
watermark = PdfReader(watermark_path).pages[0]

writer = PdfWriter()
base_page.merge_page(watermark)
writer.add_page(base_page)

for page in writer.pages:
    page.compress_content_streams()

tmp_path = Path(".")

target_png_path = RESOURCE_ROOT / "test_watermarking_reportlab_rendering.png"
pdf_path = tmp_path / "out.pdf"
png_path = tmp_path / "test_watermarking_reportlab_rendering.png"

writer.write(pdf_path)
# False positive: https://github.com/PyCQA/bandit/issues/333
subprocess.run(
    [  # noqa: S603
        GHOSTSCRIPT_BINARY,
        "-r120",
        "-sDEVICE=pngalpha",
        "-o",
        png_path,
        pdf_path,
    ]
)
assert png_path.is_file()

I get:

test_watermarking_reportlab_rendering

@MartinThoma
Copy link
Member Author

MartinThoma commented Sep 13, 2023

Please note especially that part:

image

@MartinThoma MartinThoma merged commit 02f2aa6 into main Sep 15, 2023
@MartinThoma MartinThoma deleted the pdfkit-reportlab-merge-issue branch September 15, 2023 18:57
MartinThoma added a commit that referenced this pull request Sep 17, 2023
## What's new

### Bug Fixes (BUG)
-  Missing new line in extract_text with cm operations (#2142) by @pubpub-zz
-  _get_fonts not processing properly CIDFonts and annotations (#2194) by @pubpub-zz

### Documentation (DOC)
-  Sort list of contributors by @MartinThoma

### Developer Experience (DEV)
-  Give attribution in release notes (#2196) by @MartinThoma

### Maintenance (MAINT)
-  Update packages (#2195) by @MartinThoma
-  Rename PdfWriter.create_viewer_preferences to PdfWriter.create_viewer_preferences (#2190) by @marcstober
-  Mark `cryptography` as default (#2186) by @exiledkingcc

### Testing (TST)
-  Issue with merging pdfkit (#2191) by @MartinThoma

### Code Style (STY)
-   clean-up overriden variable (#2189) by @pubpub-zz

[Full Changelog](3.16.0...3.16.1)
@Lucas-C
Copy link
Member

Lucas-C commented Sep 19, 2023

So what was the origin of this problem in the end?

@stefan6419846
Copy link
Collaborator

This still has not yet been fixed and is still marked as an expected failure, thus there is no known origin for now.

@martin-thoma
Copy link

I should probably open an issue (@MartinThoma pinging so that I don't forget :-) )

@Lucas-C
Copy link
Member

Lucas-C commented Sep 19, 2023

I opened #2203 to solve this

MartinThoma pushed a commit that referenced this pull request Sep 24, 2023
This fixes the issue spotted in #2191

The solution was to re-introduce calls to `PageObject._push_pop_gs()`,
in `PageObject._merge_page` & `PageObject._merge_page_writer()`,
but to optimize `PageObject._push_pop_gs()` by introducing a `ContentsStream.isolate_graphics_state()` method.
@MartinThoma MartinThoma mentioned this pull request Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants