fix: pdf upload with exif data in images #36634

cardoso · 2025-08-05T23:59:21Z

Proposed changes (including videos or screenshots)

This can only be reproduced with AWS S3 storage type due to recently introduced integrity checks:
https://aws.amazon.com/blogs/aws/introducing-default-data-integrity-protections-for-new-objects-in-amazon-s3/

The bug is caused because the Exif stripping logic in exif-be-gone’s ExifTransformer._scrubOther is applied to all "other" file types, including PDFs. However, PDF files do not contain EXIF, XMP, or FLIR markers in the same way as images, and stripping bytes from them can corrupt their structure, especially the xref table.

To strip Exif from images embedded in a PDF, we would need to:

Parse the PDF structure.
Extract each embedded image.
Strip Exif from each image individually.
Re-embed the cleaned images back into the PDF.

This is a much more complex task and requires a dedicated PDF library to manipulate PDF internals. The current stream-based approach is not sufficient for this.

The fix for now is preventing pdfs from going through this flow altogether.

Issue(s)

https://rocketchat.atlassian.net/browse/SUP-833

Steps to test or reproduce

I used the following python script in Google Colab to generate a minimal pdf file that triggers the issue:

!pip install pillow piexif pymupdf
import fitz  # PyMuPDF
import piexif
from PIL import Image

# Step 1: Create a JPEG with EXIF (or use your own)
img = Image.new('RGB', (400, 300), color='blue')
exif_dict = {"0th": {piexif.ImageIFD.Artist: u"Test Artist"}}
exif_bytes = piexif.dump(exif_dict)
img.save("exif_test.jpg", exif=exif_bytes)
jpeg_path = "exif_test.jpg"  # Or use your own JPEG file with EXIF

# Step 2: Create a PDF and embed the JPEG as an image XObject
doc = fitz.open()
page = doc.new_page(width=600, height=800)
rect = fitz.Rect(100, 100, 500, 400)

with open(jpeg_path, "rb") as f:
    img_bytes = f.read()

# Embed the JPEG as-is (preserves EXIF)
page.insert_image(rect, stream=img_bytes)

doc.save("test_exif_embed.pdf")
print("PDF with embedded EXIF image generated: test_exif_embed.pdf")

Further comments

changeset-bot · 2025-08-05T23:59:26Z

⚠️ No Changeset found

Latest commit: 64b1246

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

codecov · 2025-08-06T00:05:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 65.69%. Comparing base (4e55ef4) to head (64b1246).
⚠️ Report is 7 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop   #36634      +/-   ##
===========================================
- Coverage    65.70%   65.69%   -0.01%     
===========================================
  Files         3199     3199              
  Lines       106892   106892              
  Branches     20343    20343              
===========================================
- Hits         70232    70227       -5     
- Misses       34015    34019       +4     
- Partials      2645     2646       +1

Flag	Coverage Δ
e2e	`56.95% <ø> (-0.03%)`	⬇️
unit	`71.28% <ø> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2025-08-06T00:06:08Z

PR Preview Action v1.6.2
🚀 View preview at https://RocketChat.github.io/Rocket.Chat/pr-preview/pr-36634/
Built to branch `gh-pages` at 2025-08-07 20:04 UTC. Preview will be ready when the GitHub Pages deployment is complete.

…nto fix/specific-pdf-fails-upload-SUP-833

dionisio-bot · 2025-08-06T18:09:44Z

Looks like this PR is not ready to merge, because of the following issues:

This PR is missing the 'stat: QA assured' label

Please fix the issues and try again

If you have any trouble, please check the PR guidelines

cardoso · 2025-08-10T22:29:34Z

Superseded by #36676

fix: pdf upload with exif data in images

ad81888

cardoso added this to the 7.10.0 milestone Aug 6, 2025

cardoso added 3 commits August 6, 2025 11:04

Merge branch 'develop' of https://github.com/RocketChat/Rocket.Chat i…

5d4bbac

…nto fix/specific-pdf-fails-upload-SUP-833

Merge branch 'develop' of https://github.com/RocketChat/Rocket.Chat i…

b283767

…nto fix/specific-pdf-fails-upload-SUP-833

test: pdf with embedded exif data

0a3b1f8

cardoso added 3 commits August 6, 2025 16:52

Merge branch 'develop' into fix/specific-pdf-fails-upload-SUP-833

b7f5670

Merge branch 'develop' into fix/specific-pdf-fails-upload-SUP-833

2e0dbe2

Merge branch 'develop' into fix/specific-pdf-fails-upload-SUP-833

64b1246

cardoso closed this Aug 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pdf upload with exif data in images #36634

fix: pdf upload with exif data in images #36634

Uh oh!

cardoso commented Aug 5, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Aug 5, 2025 •

edited

Loading

Uh oh!

codecov bot commented Aug 6, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 6, 2025 •

edited

Loading

Built to branch `gh-pages` at 2025-08-07 20:04 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

dionisio-bot bot commented Aug 6, 2025

Uh oh!

cardoso commented Aug 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: pdf upload with exif data in images #36634

fix: pdf upload with exif data in images #36634

Uh oh!

Conversation

cardoso commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes (including videos or screenshots)

Issue(s)

Steps to test or reproduce

Further comments

Uh oh!

changeset-bot bot commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

codecov bot commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2025-08-07 20:04 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

dionisio-bot bot commented Aug 6, 2025

Uh oh!

cardoso commented Aug 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cardoso commented Aug 5, 2025 •

edited

Loading

changeset-bot bot commented Aug 5, 2025 •

edited

Loading

codecov bot commented Aug 6, 2025 •

edited

Loading

github-actions bot commented Aug 6, 2025 •

edited

Loading

Built to branch `gh-pages` at 2025-08-07 20:04 UTC.
Preview will be ready when the GitHub Pages deployment is complete.