-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
White space BBOX is Wrong #823
Comments
Found the reason: {
"size":10.201054573059082,
"flags":0,
"font":"ArialMT",
"color":0,
"ascender":1.0750000476837158,
"descender":-0.29899999499320984,
"text":"Total:",
"origin":[
129.0,
228.2010040283203
],
"bbox":[
129.0,
217.96078491210938,
156.18267822265625,
231.04920959472656
]
},
{
"size":29.90174102783203, # <=== look at this! Compare to bbox height!
"flags":0,
"font":"ArialMT",
"color":0,
"ascender":1.0750000476837158,
"descender":-0.29899999499320984,
"text":" ",
"origin":[
156.0,
228.2010040283203
],
"bbox":[ # but the bbox height is just over 13, less than 50% of fontsize!
156.0,
217.98611450195312,
182.1584930419922,
231.0421600341797
]
}, Whatever the reason for this may be: all you can do is rejecting the "reduced" span bbox if it has no smaller height than the original. |
Actually, this ridiculous fontsize of 29.9 is contained in the PDF - so ABBYY is to blame, not (Py-) MuPDF. You even must say, that it is a pretty good job to come up with a reasonable bbox under these circumstances! |
Fixed by v1.18.7, currently being uploaded. |
Thanks.
…On Wed, 3 Feb, 2021, 4:26 am Jorj X. McKie, ***@***.***> wrote:
Fixed by v1.18.7, currently being uploaded.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#823 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APNJJG45PYVQZ4FMPREJPNLS5B7JXANCNFSM4V6S7E5A>
.
|
Hi,
I White space bbox is wrong. I have even used ascender/decender to get the actual ymin and ymax.
I have attached the input and output (span chunks are marked in red outline).
FYI - This input pdf is created using ABBY OCR.
Configurations:
Thanks
cheesecake-20191221_003.pdf
The text was updated successfully, but these errors were encountered: