-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different image format/data from Page.get_text("dict") and Fitz.get_page_images() #2290
Comments
Please provide data to reporduce problems.
Which other method? |
Ah, I did not read it thoroughly enough: you can send the file to my e-mail, if that is ok. |
I sent the file to jorj dot x dot mckie at outlook dot de. |
Found the problem, which will be fixed in next version. |
Adde check for compressed buffer existence after creating image from pdf_obj. We were falsely assuming that a PNG image had to created if the raw (compressed) stream could not be interpreted as an image. This assumption was wrong (at least) in case where two compression filters existed.
Fixed by commit cee1dda |
Thanks a lot! :) |
Please provide all mandatory information!
Describe the bug (mandatory)
I am trying to match (inline) images found via Page.get_text("dict") with the ones obtained by Fitz.get_page_images(), in order to assign the image name to the object obtained by the first method. I am having a PDF document that seems to have one single PNG image in it, I checked with a PDF editor and also from what I can read from the PDF source code there is only one image (not inline, but an object). Creating a PIL image from the data from the first method gives me a JPEG image type, from the other method it yields a PNG type. The underlying binary data is also different.
To Reproduce (mandatory)
Your configuration (mandatory)
Additional context (optional)
I would like to share the file with you, but cannot do so publicly.
The text was updated successfully, but these errors were encountered: