-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Page.searchFor
returns a separate hit rectangle for each marked content item
#575
Comments
Hi @hujunyao - I have some intermediate results / explanations: The surprise here is, that a similar thing happens for different marked content items - even if they are on the same line. I understand that you would prefer not being bothered by such technical sophistication. I yet found no place, where this behaviour might be controlled ... For now, I am removing the bug label and label this issue as "question". |
There is no (obvious) way to prevent the above from happening.
|
I have researched a little more: It is possible to detect such a situation by checking the following criteria:
With this intermediate (and admittedly not fully satisfying) result I suggest to close the issue. |
Thanks for JorjMcKie detailed reply. I think add an parameter maybe a solution. a parameter to merge overlap rectangles with same y1 , and the result may be combined. |
Page.searchFor
returns a separate hit rectangle for each marked content item
The next version 1.18.2 will finally resolve your issue and join overlapping rectangles if they are on the same line (will not work if >>> doc=fitz.open("4.pdf")
>>> page = doc[0]
>>> needle = "申请人:("
>>> page.searchFor(needle)
[Rect(90.02400207519531, 74.36576080322266, 137.4199981689453, 85.36920166015625)]
>>> # only one rect returned |
Version 1.18.2 being uploaded to PyPI right now. |
Describe the bug (mandatory)
I use searchfor to search text in pdf, get the repeated result on specific words. the sample pdf is attached.
To Reproduce (mandatory)
the sample code:
import fitz
file_handle = fitz.open(pdfpath)
page = file_handle.loadPage(0)
#key_word ="申请人:" ##correct
#key_word ="(羲和指马)" ##correct
key_word ="申请人:(" ##get repeated result
co_list=page.searchFor(key_word, hit_max=16, quads=False, flags=1)
Expected behavior (optional)
searchfor should return ONE result.
4.pdf
Your configuration (mandatory)
The text was updated successfully, but these errors were encountered: