-
Notifications
You must be signed in to change notification settings - Fork 13.1k
fix: pdf upload with exif data in images #36634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #36634 +/- ##
===========================================
- Coverage 65.70% 65.69% -0.01%
===========================================
Files 3199 3199
Lines 106892 106892
Branches 20343 20343
===========================================
- Hits 70232 70227 -5
- Misses 34015 34019 +4
- Partials 2645 2646 +1
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
…nto fix/specific-pdf-fails-upload-SUP-833
…nto fix/specific-pdf-fails-upload-SUP-833
|
Looks like this PR is not ready to merge, because of the following issues:
Please fix the issues and try again If you have any trouble, please check the PR guidelines |
|
Superseded by #36676 |
Proposed changes (including videos or screenshots)
This can only be reproduced with AWS S3 storage type due to recently introduced integrity checks:
https://aws.amazon.com/blogs/aws/introducing-default-data-integrity-protections-for-new-objects-in-amazon-s3/
The bug is caused because the Exif stripping logic in exif-be-gone’s ExifTransformer._scrubOther is applied to all "other" file types, including PDFs. However, PDF files do not contain EXIF, XMP, or FLIR markers in the same way as images, and stripping bytes from them can corrupt their structure, especially the xref table.
To strip Exif from images embedded in a PDF, we would need to:
This is a much more complex task and requires a dedicated PDF library to manipulate PDF internals. The current stream-based approach is not sufficient for this.
The fix for now is preventing pdfs from going through this flow altogether.
Issue(s)
https://rocketchat.atlassian.net/browse/SUP-833
Steps to test or reproduce
I used the following python script in Google Colab to generate a minimal pdf file that triggers the issue:
Further comments