Skip to content

Conversation

@cardoso
Copy link
Member

@cardoso cardoso commented Aug 5, 2025

Proposed changes (including videos or screenshots)

This can only be reproduced with AWS S3 storage type due to recently introduced integrity checks:
https://aws.amazon.com/blogs/aws/introducing-default-data-integrity-protections-for-new-objects-in-amazon-s3/

The bug is caused because the Exif stripping logic in exif-be-gone’s ExifTransformer._scrubOther is applied to all "other" file types, including PDFs. However, PDF files do not contain EXIF, XMP, or FLIR markers in the same way as images, and stripping bytes from them can corrupt their structure, especially the xref table.

To strip Exif from images embedded in a PDF, we would need to:

  1. Parse the PDF structure.
  2. Extract each embedded image.
  3. Strip Exif from each image individually.
  4. Re-embed the cleaned images back into the PDF.

This is a much more complex task and requires a dedicated PDF library to manipulate PDF internals. The current stream-based approach is not sufficient for this.

The fix for now is preventing pdfs from going through this flow altogether.

Issue(s)

https://rocketchat.atlassian.net/browse/SUP-833

Steps to test or reproduce

I used the following python script in Google Colab to generate a minimal pdf file that triggers the issue:

!pip install pillow piexif pymupdf
import fitz  # PyMuPDF
import piexif
from PIL import Image

# Step 1: Create a JPEG with EXIF (or use your own)
img = Image.new('RGB', (400, 300), color='blue')
exif_dict = {"0th": {piexif.ImageIFD.Artist: u"Test Artist"}}
exif_bytes = piexif.dump(exif_dict)
img.save("exif_test.jpg", exif=exif_bytes)
jpeg_path = "exif_test.jpg"  # Or use your own JPEG file with EXIF

# Step 2: Create a PDF and embed the JPEG as an image XObject
doc = fitz.open()
page = doc.new_page(width=600, height=800)
rect = fitz.Rect(100, 100, 500, 400)

with open(jpeg_path, "rb") as f:
    img_bytes = f.read()

# Embed the JPEG as-is (preserves EXIF)
page.insert_image(rect, stream=img_bytes)

doc.save("test_exif_embed.pdf")
print("PDF with embedded EXIF image generated: test_exif_embed.pdf")

Further comments

@changeset-bot
Copy link

changeset-bot bot commented Aug 5, 2025

⚠️ No Changeset found

Latest commit: 64b1246

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@codecov
Copy link

codecov bot commented Aug 6, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 65.69%. Comparing base (4e55ef4) to head (64b1246).
⚠️ Report is 7 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop   #36634      +/-   ##
===========================================
- Coverage    65.70%   65.69%   -0.01%     
===========================================
  Files         3199     3199              
  Lines       106892   106892              
  Branches     20343    20343              
===========================================
- Hits         70232    70227       -5     
- Misses       34015    34019       +4     
- Partials      2645     2646       +1     
Flag Coverage Δ
e2e 56.95% <ø> (-0.03%) ⬇️
unit 71.28% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 6, 2025

PR Preview Action v1.6.2

🚀 View preview at
https://RocketChat.github.io/Rocket.Chat/pr-preview/pr-36634/

Built to branch gh-pages at 2025-08-07 20:04 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@cardoso cardoso added this to the 7.10.0 milestone Aug 6, 2025
@dionisio-bot
Copy link
Contributor

dionisio-bot bot commented Aug 6, 2025

Looks like this PR is not ready to merge, because of the following issues:

  • This PR is missing the 'stat: QA assured' label

Please fix the issues and try again

If you have any trouble, please check the PR guidelines

@cardoso
Copy link
Member Author

cardoso commented Aug 10, 2025

Superseded by #36676

@cardoso cardoso closed this Aug 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant