-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Process XRefStm #1297
ENH: Process XRefStm #1297
Conversation
fixes py-pdf#1295 includes test file adjustment
Codecov Report
@@ Coverage Diff @@
## main #1297 +/- ##
==========================================
- Coverage 95.07% 94.67% -0.41%
==========================================
Files 30 30
Lines 4973 5106 +133
Branches 1023 1052 +29
==========================================
+ Hits 4728 4834 +106
- Misses 139 157 +18
- Partials 106 115 +9
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@MartinThoma, |
stdby |
fixes py-pdf#1279 / Status_v1_Reviewers-Guide.pdf
fixes py-pdf#1294 and may be others
* if chained xref/trailer are not good * if the object header ('id' 'gen' obj) or if the object is not present in the xref table, will search the file for the object. fixes py-pdf#1273
tests/test_xmp.py
Outdated
reader.xmp_metadata | ||
assert exc.value.args[0].startswith("XML in XmpInformation was invalid") | ||
assert exc.value.args[0].startswith("Stream length not defined") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this change? I guess the reader.xmp_metadata
isn't even touched, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this PR, one could at least get the number of pages:
assert len(reader.pages) == 5
I guess with this PR it no longer works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to modify the test result. I did not analyze further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this PR, one could at least get the number of pages:
assert len(reader.pages) == 5
I guess with this PR it no longer works?
under analysis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PDF was corrupted : the XRef package had a /Length key corrupted. I've changed the code to discard the loading of the XRef object to allow the main program to recover to a maximum information : you can now get the metadata 😊
the access to number of pages is (still?) possible
discard non readable XRef object to try to do your best
I had to merge iss_1292 to have a global PR. this PR is now complete |
Co-authored-by: Martin Thoma <info@martin-thoma.de>
Co-authored-by: Martin Thoma <info@martin-thoma.de>
5 sec before me 😝 |
I'll look into applying black automatically in the CI as an extra commit today 😄 Also, I want to make flake8 run in parallel to the tests and mypy after pytest so that I can still see issues there in a failed run. |
I don't think it worth it. |
It's a different test scenario. |
Version 2.10.5, 2022-09-04 -------------------------- New Features (ENH): - Process XRefStm (#1297) - Auto-detect RTL for text extraction (#1309) Bug Fixes (BUG): - Avoid scaling cropbox twice (#1314) Robustness (ROB): - Fix offset correction in revised PDF (#1318) - Crop data of /U and /O in encryption dictionary to 48 bytes (#1317) - MultiLine bfrange in cmap (#1299) - Cope with 2 digit codes in bfchar (#1310) - Accept '/annn' charset as ASCII code (#1316) - Log errors during Float / NumberObject initialization (#1315) - Cope with corrupted entries in xref table (#1300) Documentation (DOC): - Migration guide (PyPDF2 1.x \xe2\x9e\x94 2.x) (#1324) - Creating a coverage report (#1319) - Fix AnnotationBuilder.free_text example (#1311) - Fix usage of page.scale by replacing it with page.scale_by (#1313) Developer Experience (DEV): - Only run coverage for PyPDF2 Maintenance (MAINT): - PdfReaderProtocol (#1303) - Throw PdfReadError if Trailer can't be read (#1298) - Remove catching OverflowException (#1302) Full Changelog: 2.10.4...2.10.5
Fixes #1273
Fixes #1279
Fixes #1292
Fixes #1294
Fixes #1295
ROB: Cope with xref starting on \r\n
ROB: Escaped octal code followed by decimal int
ROB: Cope with some corrupted entries in xref table
ROB: Extend xref autorepair cases