Skip to content

IndexError in page.get_links() #4404

Open
@robvandijk

Description

@robvandijk

Description of the bug

When the attached file is processed, calling page.get_links() leads to an IndexError for page 14.

How to reproduce the bug

This was traced down to the following lines in src/__init__.py:

    for i, v in enumerate(array.replace("null", "0").split()[1:]):
         t[i] = float(v)

For page 14 the array variable contains

/XYZ 116.00001 745.92 0 34 0 R/XYZ 116.00001 745.92 0 40 0 R/XYZ 116.00001 745.92 0 47 0 R/XYZ 116.00001 745.92 0 56 0 R/XYZ 116.00001 745.92 0 64 0 R/XYZ 116.00001 745.92 0

leading to the following array being enumerated in the loop:

['116.00001', '745.92', '0', '34', '0', 'R/XYZ', '116.00001', '745.92', '0', '40', '0', 'R/XYZ', '116.00001', '745.92', '0', '47', '0', 'R/XYZ', '116.00001', '745.92', '0', '56', '0', 'R/XYZ', '116.00001', '745.92', '0', '64', '0', 'R/XYZ', '116.00001', '745.92', '0']

which leads to the IndexError.

index_error.pdf

PyMuPDF version

Built from source

Operating system

Linux

Python version

3.12

Metadata

Metadata

Assignees

Labels

bugfix developedrelease schedule to be determined

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions