Closed
Description
Describe the bug (mandatory)
get_texttrace returned a incorrect character bbox
To Reproduce (mandatory)
盐城高新区投资集团有限公司2023年度第四期超短期融资券募集说明书.pdf
d = fitz.Document('temp/盐城高新区投资集团有限公司2023年度第四期超短期融资券募集说明书.pdf')
page = d[250]
for span in page.get_texttrace():
for char in span['chars']:
if chr(char[0]) == '民':
print(char)
The above code will print this message:
(27665, 8775, (114.0, 634.72998046875), (114.0, 633.2604370117188, 124.44999694824219, 643.71044921875)
Origin y (634.72998046875) is too close to bbox y0 (633.2604370117188), and this is obviously not right.
Screenshot of the incorrect character bbox (red).
Screenshot code:
page.get_pixmap(matrix=fitz.Matrix(2, 2), alpha=False, clip=fitz.Rect(114.0, 633.2604370117188, 124.44999694824219, 643.71044921875)).save(f'test.png')
Your configuration (mandatory)
print(sys.version, "\n", sys.platform, "\n", fitz.doc)
3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)]
win32
PyMuPDF 1.22.5: Python bindings for the MuPDF 1.22.2 library.
Version date: 2023-06-21 00:00:01.
Built for Python 3.7 on win32 (64-bit).