Phantom "f" character appears 

(Apologies if this is user error)

I'm working on extracting from a series of PDF customer orders.  Somehow when I call page.get_textbox(x,y,xx,yy) I get phantom "f" characters.

First, I look at the page's text blocks:

        blocks = page.get_text("blocks")
        for b in blocks:
            print (b)

There is this one block:
(116.70000457763672, 65.65986633300781, 201.9755859375, 74.73955535888672, 'PURCHASE ORDER for\n', 5, 0)
...

Now when I try to extract using get_textbox

        text = page.get_textbox( [116,65,202,75] ).encode("utf8")
        print (text)

I get the following:

b'PURCHASE ORDER for\nf\nf'

Essentially these phantom "f" characters appear?  

This is not limited to just this block and happens with all other blocks as well.  It seem that anytime there is a \n the function return \nf

I'm running:
- Windows 10
- python 3.8.2 64bit
- pymupdf 1.18.14 (wheel)

Much thanks for the support.  Your work is amazing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Phantom "f" character appears #1078

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Phantom "f" character appears #1078

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions