-
Notifications
You must be signed in to change notification settings - Fork 1.5k
ROB: ignore faulty trailing newline during RLE decoding #3355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Found PDFs from Dalim software with multi-encoded streams: inner stream is RLE, outer stream is FLATE. The inner stream contains a trailing newline char that breaks the RLE decoding. It seems that there was in some Dalim version a systematíc error that included the bytes of the inner stream just from raw PDF bytes with the trailing newline before "endsrteam". This is fixed with the changes by ignoring the trailing newline and raising a warning instead of an exception.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3355 +/- ##
==========================================
+ Coverage 96.76% 96.83% +0.07%
==========================================
Files 54 54
Lines 9076 9094 +18
Branches 1676 1677 +1
==========================================
+ Hits 8782 8806 +24
+ Misses 176 172 -4
+ Partials 118 116 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Again - was a pleasure and I appreciate to get a more robust PDF tool ;-) |
## What's new ### New Features (ENH) - Implement flattening for writer (#3312) by @PJBrs ### Bug Fixes (BUG) - Unterminated object when using PdfWriter with incremental=True (#3345) by @m32 ### Robustness (ROB) - Resolve some image extraction edge cases (#3371) by @stefan6419846 - Ignore faulty trailing newline during RLE decoding (#3355) by @henningkoertelgmg - Gracefully handle odd-length strings in parse_bfchar (#3348) by @stefan6419846 ### Developer Experience (DEV) - Modernize license specifiers (#3338) by @stefan6419846 ### Maintenance (MAINT) - Reduce max-complexity of tool.ruff.lint.mccabe (#3365) by @j-t-1 - Refactor text extraction code by @MartinThoma [Full Changelog](5.7.0...5.8.0)
Found PDFs from Dalim software with multi-encoded streams: inner stream is RLE, outer stream is FLATE. The inner stream contains a trailing newline char that breaks the RLE decoding. It seems that there was in some Dalim version a systematíc error that included the bytes of the inner stream just from raw PDF bytes with the trailing newline before "endstream". This is fixed with the changes by ignoring the trailing newline and raising a warning instead of an exception.
## What's new ### New Features (ENH) - Implement flattening for writer (py-pdf#3312) by @PJBrs ### Bug Fixes (BUG) - Unterminated object when using PdfWriter with incremental=True (py-pdf#3345) by @m32 ### Robustness (ROB) - Resolve some image extraction edge cases (py-pdf#3371) by @stefan6419846 - Ignore faulty trailing newline during RLE decoding (py-pdf#3355) by @henningkoertelgmg - Gracefully handle odd-length strings in parse_bfchar (py-pdf#3348) by @stefan6419846 ### Developer Experience (DEV) - Modernize license specifiers (py-pdf#3338) by @stefan6419846 ### Maintenance (MAINT) - Reduce max-complexity of tool.ruff.lint.mccabe (py-pdf#3365) by @j-t-1 - Refactor text extraction code by @MartinThoma [Full Changelog](py-pdf/pypdf@5.7.0...5.8.0)
Found PDFs from Dalim software with multi-encoded streams: inner stream is RLE, outer stream is FLATE. The inner stream contains a trailing newline char that breaks the RLE decoding. It seems that there was in some Dalim versions a systematic error that included the bytes of the inner stream just from raw PDF bytes with the trailing newline before "endstream". This is fixed with the changes by ignoring the trailing newline and raising a warning instead of an exception.