fix: wrong file length after exif strip #36676

cardoso · 2025-08-10T21:11:31Z

Proposed changes (including videos or screenshots)

Updates the file size after stripping exif data. It seems to be causing issues with the S3 storage type when uploading files.

Before

After

Issue(s)

https://rocketchat.atlassian.net/browse/SUP-833

Steps to test or reproduce

This can only be reproduced with AWS S3 storage type due to recently introduced integrity checks:
https://aws.amazon.com/blogs/aws/introducing-default-data-integrity-protections-for-new-objects-in-amazon-s3/

Additionally, S3 validates the entire file’s size and checksum when you call the CompleteMultipartUpload API.

I used the following python script in Google Colab to generate a minimal pdf file that triggers the issue:
test_exif_embed.pdf

!pip install pillow piexif pymupdf
import fitz  # PyMuPDF
import piexif
from PIL import Image

# Step 1: Create a JPEG with EXIF (or use your own)
img = Image.new('RGB', (400, 300), color='blue')
exif_dict = {"0th": {piexif.ImageIFD.Artist: u"Test Artist"}}
exif_bytes = piexif.dump(exif_dict)
img.save("exif_test.jpg", exif=exif_bytes)
jpeg_path = "exif_test.jpg"  # Or use your own JPEG file with EXIF

# Step 2: Create a PDF and embed the JPEG as an image XObject
doc = fitz.open()
page = doc.new_page(width=600, height=800)
rect = fitz.Rect(100, 100, 500, 400)

with open(jpeg_path, "rb") as f:
    img_bytes = f.read()

# Embed the JPEG as-is (preserves EXIF)
page.insert_image(rect, stream=img_bytes)

doc.save("test_exif_embed.pdf")
print("PDF with embedded EXIF image generated: test_exif_embed.pdf")

Further comments

Initially, the bug surfaced because the Exif stripping logic in exif-be-gone’s ExifTransformer._scrubOther is applied to all "other" file types, including PDFs. However, PDF files do not contain EXIF, XMP, or FLIR markers in the same way as images, and stripping bytes from them can corrupt their structure, especially the xref table.

To strip Exif from images embedded in a PDF, we would need to:

Parse the PDF structure.
Extract each embedded image.
Strip Exif from each image individually.
Re-embed the cleaned images back into the PDF.

This is a much more complex task and requires a dedicated PDF library to manipulate PDF internals. The current stream-based approach is not sufficient for this.

Although we could prevent PDFs from being exif stripped, it would only hide this bug, and in reality most PDF software are still able to work even with the invalid xref table.

dionisio-bot · 2025-08-10T21:11:35Z

Looks like this PR is ready to merge! 🎉
If you have any trouble, please check the PR guidelines

changeset-bot · 2025-08-10T21:11:36Z

⚠️ No Changeset found

Latest commit: 8ae3e55

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

github-actions · 2025-08-10T21:16:31Z

PR Preview Action v1.6.2
🚀 View preview at https://RocketChat.github.io/Rocket.Chat/pr-preview/pr-36676/
Built to branch `gh-pages` at 2025-08-10 23:13 UTC. Preview will be ready when the GitHub Pages deployment is complete.

fix: wrong file length after exif strip

8ae3e55

cardoso added this to the 7.10.0 milestone Aug 10, 2025

cardoso mentioned this pull request Aug 10, 2025

fix: pdf upload with exif data in images #36634

Closed

Create tame-stingrays-hug.md

5815143

cardoso marked this pull request as ready for review August 10, 2025 23:11

cardoso requested a review from a team as a code owner August 10, 2025 23:11

julio-rocketchat approved these changes Aug 11, 2025

View reviewed changes

debdutdeb approved these changes Aug 11, 2025

View reviewed changes

cardoso added the stat: QA assured Means it has been tested and approved by a company insider label Aug 11, 2025

dionisio-bot bot added the stat: ready to merge PR tested and approved waiting for merge label Aug 11, 2025

kodiakhq bot merged commit 61bca86 into develop Aug 11, 2025
87 of 89 checks passed

kodiakhq bot deleted the fix/exif-length branch August 11, 2025 16:16

rocketchat-github-ci mentioned this pull request Aug 20, 2025

Release 7.10.0 #36760

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: wrong file length after exif strip #36676

fix: wrong file length after exif strip #36676

Uh oh!

cardoso commented Aug 10, 2025 •

edited

Loading

Uh oh!

dionisio-bot bot commented Aug 10, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Aug 10, 2025

Uh oh!

github-actions bot commented Aug 10, 2025 •

edited

Loading

Built to branch `gh-pages` at 2025-08-10 23:13 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: wrong file length after exif strip #36676

fix: wrong file length after exif strip #36676

Uh oh!

Conversation

cardoso commented Aug 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes (including videos or screenshots)

Before

After

Issue(s)

Steps to test or reproduce

Further comments

Uh oh!

dionisio-bot bot commented Aug 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Aug 10, 2025

⚠️ No Changeset found

Uh oh!

github-actions bot commented Aug 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2025-08-10 23:13 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cardoso commented Aug 10, 2025 •

edited

Loading

dionisio-bot bot commented Aug 10, 2025 •

edited

Loading

github-actions bot commented Aug 10, 2025 •

edited

Loading

Built to branch `gh-pages` at 2025-08-10 23:13 UTC.
Preview will be ready when the GitHub Pages deployment is complete.