add_highlight_annot using clip generates "A Number is Out of Range" error in PDF #2322

TheCapybaraClub · 2023-04-06T01:39:29Z

Describe the bug (mandatory)

I am trying use page.add_highlight_annot with the clip option, and while the highlighting is placed as expected, the resulting PDF contains "A Number is Out of Range" error. The clip is built from information within the results of page.get_text("words", textpage=textpage) so I am not sure how my clip could be illegal. If this is not a bug, what I am doing wrong?

To Reproduce (mandatory)

test.pdf

Import Fitz and read PDF

import fitz
pdfDoc = fitz.open('./test.pdf')

Get Text and Do Text Stuff with it (here we find the index of target)

page = pdfDoc[0]
textpage = page.get_textpage(clip=page.mediabox)
page_text_words = page.get_text("words", textpage=textpage)

# xi = list index of key word
for xi, x in enumerate(page_text_words):
    if x[4]=='pellentesque,':
        target_idx = xi
        print(xi, x)

# results seem reasonable
# 88 (242.8954315185547, 157.6929473876953, 308.8439025878906, 172.07373046875, 'pellentesque,', 3, 6, 5)

Get context around this target using the list index

context_span = 10
start_idx = target_idx-context_span
end_idx = target_idx+context_span

context_text = " ".join([x[4] for x in page_text_words[start_idx:(end_idx+1)] ])

# results seem reasonable
# 'et maximus urna. Nullam posuere feugiat orci non ullamcorper. Proin pellentesque, odio id facilisis mollis, sem risus suscipit ex, non aliquet'

Build a clip for this context text

clip_rect = list(page_text_words[start_idx][:4])
for xi, x in enumerate(page_text_words[start_idx:(end_idx+1)]):
    if x[0]<clip_rect[0]:
        clip_rect[0]=x[0]
    if x[1]<clip_rect[1]:
        clip_rect[1]=x[1]
    if x[2]>clip_rect[2]:
        clip_rect[2]=x[2]
    if x[3]>clip_rect[3]:
        clip_rect[3]=x[3]

# results seem reasonable, even though the method is pretty ugly
# [72.02400207519531, 143.41297912597656, 540.11474609375, 186.353759765625]

Use the clip to add a highlight annotation

x0,y0,x1,y1 = clip_rect
rect = fitz.Rect(x0,y0,x1,y1)
highlight = page.add_highlight_annot(quads=None, clip=rect)
highlight.update()

Save PDF

pdfDoc.save(f"./test_out.pdf", garbage=4, clean=True, deflate=True, deflate_images=True, deflate_fonts=True)
print(f"Info: Saved Annotated PDF ./test_out.pdf")

# opening this PDF shows the highlighting as expected but also pops up "A Number is Out of Range" error

Expected behavior (optional)

I would expect not to get the "A Number is Out of Range" error

Screenshots (optional)

At first the highlighting doesn't show, only the error. But once you click 'Ok' the error goes away and highlighting shows. Any scrolling brings the error prompt back up.

After clicking 'Ok' and followed by and scrolling

Your configuration (mandatory)

3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)] 
 win32 
 
PyMuPDF 1.21.0: Python bindings for the MuPDF 1.21.0 library.
Version date: 2022-11-08 00:00:01.
Built for Python 3.10 on win32 (64-bit).

Additional context (optional)

As always, thank you for the support!

The text was updated successfully, but these errors were encountered:

JorjMcKie · 2023-04-06T10:30:34Z

This is not a bug, but incorrect use of the method - actually amazing that something was highlighted at all:
If you set quads to None then start and stop must not be None. Probably a few plausibility checks should be inserted into the method and the documentation be updated.

The clip != None parameter also only makes sense and is intended only for start != None and stop != None.

JorjMcKie · 2023-04-06T10:33:06Z

As an aside: Because of quad=None, coordinates of MuPDF's infinite rectangle are inserted - these are the numbers the PDF viewer does not like.
You simply should have used your rectangle as the highlight rectangle.

TheCapybaraClub · 2023-04-06T10:58:15Z

I do not understand what the proper use of the method should be. As you say, If you set quads to None then start and stop must not be None, so I tried these... but got the same issue.

highlight = page.add_highlight_annot(quads=None, clip=rect, start=start_context_clip, stop=end_context_clip)

highlight = page.add_highlight_annot(quads=None, start=start_context_clip, stop=end_context_clip)

In your follow up, you said I simply should have used my rectangle as the highlight rectangle. Do you mean to say I should use the method this way, where I set quads equal to rect? I agree this will avoid the error, but doesn't accomplish the same goal of "to highlight consecutive lines between the points start and stop" and rather just throws one highlight across all rows.

highlight = page.add_highlight_annot(quads=rect, start=start_context_clip, stop=end_context_clip)

I would like to understand how to duplicate the example shown in the second note within the documentation
https://pymupdf.readthedocs.io/en/latest/page.html#Page.add_highlight_annot

JorjMcKie · 2023-04-06T11:16:51Z

This is correct: page.add_highlight_annot(quads=None, start=start_context_clip, stop=end_context_clip). Using clip (in this scenario only) is also correct.

If using quads, all remaining parameters must be None.

Both ways are mutually exclusive.

Currently however, the method internally may generate infinite rectangles / quads which are the reason for the PDF viewer's complaint. This is fixed in the next release.

JorjMcKie added the not a bug not a bug / user error / unable to reproduce label Apr 6, 2023

JorjMcKie self-assigned this Apr 6, 2023

JorjMcKie added duplicate Fixed in next release and removed not a bug not a bug / user error / unable to reproduce labels Apr 6, 2023

julian-smith-artifex-com added a commit that referenced this issue Apr 14, 2023

changes.txt: added #2322

41c2a31

julian-smith-artifex-com removed the Fixed in next release label Apr 14, 2023

julian-smith-artifex-com closed this as completed Apr 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add_highlight_annot using clip generates "A Number is Out of Range" error in PDF #2322

add_highlight_annot using clip generates "A Number is Out of Range" error in PDF #2322

TheCapybaraClub commented Apr 6, 2023

JorjMcKie commented Apr 6, 2023

JorjMcKie commented Apr 6, 2023

TheCapybaraClub commented Apr 6, 2023

JorjMcKie commented Apr 6, 2023

add_highlight_annot using clip generates "A Number is Out of Range" error in PDF #2322

add_highlight_annot using clip generates "A Number is Out of Range" error in PDF #2322

Comments

TheCapybaraClub commented Apr 6, 2023

Describe the bug (mandatory)

To Reproduce (mandatory)

Import Fitz and read PDF

Get Text and Do Text Stuff with it (here we find the index of target)

Get context around this target using the list index

Build a clip for this context text

Use the clip to add a highlight annotation

Save PDF

Expected behavior (optional)

Screenshots (optional)

At first the highlighting doesn't show, only the error. But once you click 'Ok' the error goes away and highlighting shows. Any scrolling brings the error prompt back up.

After clicking 'Ok' and followed by and scrolling

Your configuration (mandatory)

Additional context (optional)

JorjMcKie commented Apr 6, 2023

JorjMcKie commented Apr 6, 2023

TheCapybaraClub commented Apr 6, 2023

JorjMcKie commented Apr 6, 2023