Inconsistent information between getText("dict")['blocks'] and getText("html") #956
-
Hello< I really appreciate this great repo!
Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
I am afraid this will not work. So when you see a zero bbox in the *ML files, there is nothing I can do. Such things go back to an inconsistent / erroneous PDF or font information. Any corrective code I may be using in my functions cannot be taken over to the *ML functions. |
Beta Was this translation helpful? Give feedback.
I am afraid this will not work.
HTML, XHTML and XML extraction options are based on original MuPDF functions and as such must be accepted as they are.
The other options are my own making.
To "my" functions, over time and upon request, I added corrective code where errors were reported and introduced some extended features like reduced glyph heights or reducing the text amount to a given clip rectangle.
So when you see a zero bbox in the *ML files, there is nothing I can do. Such things go back to an inconsistent / erroneous PDF or font information. Any corrective code I may be using in my functions cannot be taken over to the *ML functions.