-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Extract superscripts (x² instead of x2) #2045
Comments
Given these two examples above, why did the extraction of: LibreOffice-Writer.pdf -> The square of x is denoted by x², the cube by x³. which is perfect :) But: pdflatex-x-square.pdf -> x2= 9 means x∈{3,−3}. |
I didn't analyze it so far but I guess that Libre office makes use of the Unicode symbol. In contrast, latex changes the font size / position of a normal "2" |
This is my example but it is empty when extracted? Screenshot 2023-07-30 at 20.07.16.pdf Text too large and pixelation issue? |
so taking a screenshot I thought need tesseract OCR instead? Does it have python ? |
@miriam-z Please ask your questions in https://github.com/py-pdf/pypdf/discussions/categories/q-a |
Explanation
Superscripts are common in math, especially squares (e.g. x²) and cubes (e.g. x³).
Code Example
How would your feature be used? (Remove this if it is not applicable.)
Examples with the expected output:
The text was updated successfully, but these errors were encountered: