-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
persistent get_text() formatting #2730
Comments
Here is sample PDF. When |
Thanks for submitting this! |
Fixed in 1.23.5. |
Is your feature request related to a problem? Please describe.
I have problem with loading the document. The problem is, that the page is loaded in wrong order
so I added
sort=True
to resolve thisThat resolved problem with sorting, but new problem has appeared. Some characters from text were replaced with
<?>
.I found some info about this behavior in this page https://pymupdf.readthedocs.io/en/latest/recipes-common-issues-and-their-solutions.html#problem-unreadable-text
But I think, that this isn't desired behaviour.
Describe the solution you'd like
I don't know how is the package implemented, but would it be possible to use same text formatters from
get_text("text")
inget_text("blocks")
?It would resolve the inconsistent formatting when
sort
argument is changed.Additional context
I'm sorry, I cannot send you the PDF file. It's internal file from company.
The text was updated successfully, but these errors were encountered: