-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decode error when trying to get drawings #2468
Comments
This is weird. Your screen print points into the middle of a text object, which are completely ignored by At that version, a new dictionary key "layer" has been introduced. Potentially your PDF uses Optional Content layers, and one of the OCGs has a non-UTF8 name? To check this, go to the "layers" tab of your PDF viewer, for example Adobe Acrobat. |
Thank you so much for the quick reply @JorjMcKie Unfortunately I don't think it's related to the OCGs naming 😔 as I'm seeing the following (using >>> print(doc.get_ocgs())
{5: {'name': 'Layer 1', 'intent': [], 'on': True, 'usage': None}} now that I think about it, the drawings in the document don't have any sensitive info since they're just SVG lines, so it's probably safe to share that part only. If I read the drawing content at xref=4 (using
which is what I was sharing earlier in my cropped screenshot. These are bytes, and indeed if you try to do |
It's normal for binary data to not being convertible to strings. So that is no help. |
Thanks for your reply! I just sent you an email containing the troublesome PDF after removing any sensitive data. I hope that will be helpful. Please let me know if you have any additional questions. |
Thanks for the file - that did help! |
Detail descriptions: Fixing #2468: MuPDF now correctly provides the OC layer name. In PyMuPDF, a safeguard against invalid lay name strings has been implemented. Fixing: #2365: Combined "fill" and "stroke" paths ("fs") now correctly report dictionary keys from the sub-paths. Fixing #2391: Checkbox "True" values were inconsistent between getting and setting. This value is now always set to "Yes". Fixing #2400: Fixed by an internal MuPDF fix. Fixing #2404: Fixed by an internal MuPDF fix. Fixing #2430: We falsely reduced the reference count of `Py_None` object when creating the dictionary `Font.infos`. This has been corrected. Other changes: * Support for "cloudy" annotation borders * Consistent setting / unsetting of RadioButtons within same RB group. However: the RB group must be a PDF object: radio buttons with JavaScripts that simulate that behaviour are not supported. * Adobe Photoshop images are now supported as input (Pixmaps and Documents). * The /Locked key in OCProperties is now support for getting / setting. * Document method `set_layer_ui_config()` now also supports the OCG name as argument (was just the sequence number previously).
Awesome, thanks again! |
Detail descriptions: Fixing pymupdf#2468: MuPDF now correctly provides the OC layer name. In PyMuPDF, a safeguard against invalid lay name strings has been implemented. Fixing: pymupdf#2365: Combined "fill" and "stroke" paths ("fs") now correctly report dictionary keys from the sub-paths. Fixing pymupdf#2391: Checkbox "True" values were inconsistent between getting and setting. This value is now always set to "Yes". Fixing pymupdf#2400: Fixed by an internal MuPDF fix. Fixing pymupdf#2404: Fixed by an internal MuPDF fix. Fixing pymupdf#2430: We falsely reduced the reference count of `Py_None` object when creating the dictionary `Font.infos`. This has been corrected. Other changes: * Support for "cloudy" annotation borders * Consistent setting / unsetting of RadioButtons within same RB group. However: the RB group must be a PDF object: radio buttons with JavaScripts that simulate that behaviour are not supported. * Adobe Photoshop images are now supported as input (Pixmaps and Documents). * The /Locked key in OCProperties is now support for getting / setting. * Document method `set_layer_ui_config()` now also supports the OCG name as argument (was just the sequence number previously).
Detail descriptions: Fixing #2468: MuPDF now correctly provides the OC layer name. In PyMuPDF, a safeguard against invalid lay name strings has been implemented. Fixing: #2365: Combined "fill" and "stroke" paths ("fs") now correctly report dictionary keys from the sub-paths. Fixing #2391: Checkbox "True" values were inconsistent between getting and setting. This value is now always set to "Yes". Fixing #2400: Fixed by an internal MuPDF fix. Fixing #2404: Fixed by an internal MuPDF fix. Fixing #2430: We falsely reduced the reference count of `Py_None` object when creating the dictionary `Font.infos`. This has been corrected. Other changes: * Support for "cloudy" annotation borders * Consistent setting / unsetting of RadioButtons within same RB group. However: the RB group must be a PDF object: radio buttons with JavaScripts that simulate that behaviour are not supported. * Adobe Photoshop images are now supported as input (Pixmaps and Documents). * The /Locked key in OCProperties is now support for getting / setting. * Document method `set_layer_ui_config()` now also supports the OCG name as argument (was just the sequence number previously).
Fixed in 1.22.5. |
Describe the bug (mandatory)
Starting with version
1.22.0
, I'm seeing the following exception when callingpage.get_drawings()
on one of our PDF files.But I do not get any error with previous versions like
1.21.1
.To Reproduce (mandatory)
I'm a bit stuck here as unfortunately I cannot share the PDF in question because it's sensitive, and I've been struggling to create a new PDF that would mimic the issue.
Is there any chance you could provide some guidance on how to isolate the drawing issue?
So far I tried to copy the failing drawing content stream to a new PDF using version
1.21.1
, and so that I can potentially post it here, but the newly created PDF has no issue with1.22.0
+....Here is my script for copying the stream
Expected behavior (optional)
Since getting the drawings would pass for versions prior to
1.22.0
, I would expect it to pass for newer versions as well.Screenshots (optional)
Not sure if that can help, but here is a cropped screenshot of the drawing stream bytes:
Your configuration (mandatory)
For example, the output of
print(sys.version, "\n", sys.platform, "\n", fitz.__doc__)
would be sufficient (for the first two bullets).Installed via pip install pymupdf==1.22.0
The text was updated successfully, but these errors were encountered: