Skip to content

Bounding boxes for extracted text #137

Answered by JorjMcKie
simonschoe asked this question in Q&A
Discussion options

You must be logged in to vote

But before you run away in frustration, you could try parameter extract_words=True. This will enforce page_chunks=True. In the page dictionary you will then find a list of word tuples (x0, y0, x1, y1, "wordstring",...). Just like you would get it using page.get_text("words") ... but in the sequence as in the markdown text.
This is (and probably ever will be) the maximum we can provide.

Replies: 5 comments 2 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@JorjMcKie
Comment options

Comment options

You must be logged in to vote
1 reply
@simonschoe
Comment options

Answer selected by simonschoe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants
Converted from issue

This discussion was converted from issue #136 on September 13, 2024 12:00.