Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong encoding for "\Č" character when sort=True #2774

Closed
vorel99 opened this issue Oct 31, 2023 · 2 comments
Closed

wrong encoding for "\Č" character when sort=True #2774

vorel99 opened this issue Oct 31, 2023 · 2 comments

Comments

@vorel99
Copy link

vorel99 commented Oct 31, 2023

Describe the bug

I have PDF and trying to read the text, but special characters are incorrectly formatted, when sort=True is set.
text in my pdf: something \Č etc.

page.get_text()

returns something \\Č etc.

but when sort True is set, the text is formatted differently.

page.get_text(sort=True)

returns something \\\\u010c etc.

The problem is similar to #2730 but with different character.

@vorel99 vorel99 changed the title wrong encoding for \Č wrong encoding for "\Č" character when sort=True Oct 31, 2023
@JorjMcKie
Copy link
Collaborator

This is a duplicate of #2553 which will be fixed in next release.

@julian-smith-artifex-com
Copy link
Collaborator

Fixed in 1.23.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants