-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Italic info in hocr output #1371
Comments
That's a missing feature of the new LSTM engine: it does not support attributes like bold, italic and more. Tesseract 4 still supports the old OCR engine as long as you use traineddata files which include the necessary information. The files from https://github.com/tesseract-ocr/tessdata will work for you. |
@stweil: thx for the info :) |
Who knows? I don't – maybe @theraysmith has plans to enhance the LSTM engine in that direction. |
I cannot find any italic info In Tesseract 4.00.00alpha hocr output.
Tesseract 3.x included this info via the
em
tag.It would be very helpful if this could be added again in some way.
The text was updated successfully, but these errors were encountered: