Skip to content

Error referencing a non-existent page destination when writing PDF #2842

@ssjkamei

Description

@ssjkamei

Errors occur in the PDF writing process.

Environment

Which environment were you using when you encountered the problem?

> python -m platform
Windows-10-10.0.22631-SP0

> python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.3.1, crypt_provider=('cryptography', '41.0.5'), PIL=10.4.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfWriter, PdfReader

def test_write_pdf():
    filepath = r"C:\test.pdf"
    with open(filepath, "rb") as f:
        pdf_writer = PdfWriter()
        pdf_reader = PdfReader(f, True)
        print(pdf_reader.metadata)
        print(pdf_reader.named_destinations)
        pdf_writer.append(pdf_reader)

Sorry we are unable to provide the PDF.
We are in the process of confirming that we can create a PDF that can be published without any problems.

Traceback

This is the complete traceback I see:

venv\venv\Lib\site-packages\pypdf\_writer.py:2365: in append
    self.merge(
venv\venv\Lib\site-packages\pypdf\_writer.py:2474: in merge
    p = reader.pages[dest["/Page"]]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <pypdf._page._VirtualList object at 0x0000025E7C033650>, index = 1

    def __getitem__(
        self, index: Union[int, slice]
    ) -> Union[PageObject, Sequence[PageObject]]:
        if isinstance(index, slice):
            indices = range(*index.indices(len(self)))
            cls = type(self)
            return cls(indices.__len__, lambda idx: self[indices[idx]])
        if not isinstance(index, int):
            raise TypeError("sequence indices must be integers")
        len_self = len(self)
        if index < 0:
            # support negative indexes
            index = len_self + index
        if index < 0 or index >= len_self:
>           raise IndexError("sequence index out of range")
E           IndexError: sequence index out of range

Perhaps the following are causing the problem. 46 0 obj and 20 0 obj do not exist.
I tried to fix them in Adobe Acrobat, but could not figure out how to turn them off.

30 0 obj
<</AcroForm 46 0 R/Dests 20 0 R/Extensions<</ADBE<</BaseVersion/1.7/ExtensionLevel 8>>>>/Metadata 5 0 R/Names 47 0 R/OCProperties<</D<</OFF[]/Order[]/RBGroups[]>>/OCGs[48 0 R 49 0 R 50 0 R]>>/Pages 18 0 R/StructTreeRoot 14 0 R/Type/Catalog>>
endobj

When read in PdfReader, the following will be generated in named_destinations. (pdf has only one page)

{'/__WKANCHOR_2': {'/Title': '/__WKANCHOR_2', '/Page': 0, '/Type': '/XYZ', '/Left': 36, '/Top': 754, '/Zoom': 0.0}, '/__WKANCHOR_4': {'/Title': '/__WKANCHOR_4', '/Page': 0, '/Type': '/XYZ', '/Left': 305, '/Top': 754, '/Zoom': 0.0}, '/__WKANCHOR_6': {'/Title': '/__WKANCHOR_6', '/Page': 0, '/Type': '/XYZ', '/Left': 36, '/Top': 454, '/Zoom': 0.0}, '/__WKANCHOR_8': {'/Title': '/__WKANCHOR_8', '/Page': 1, '/Type': '/XYZ', '/Left': 61, '/Top': 802, '/Zoom': 0.0}, '/__WKANCHOR_a': {'/Title': '/__WKANCHOR_a', '/Page': 1, '/Type': '/XYZ', '/Left': 36, '/Top': 425, '/Zoom': 0.0}, '/__WKANCHOR_c': {'/Title': '/__WKANCHOR_c', '/Page': 2, '/Type': '/XYZ', '/Left': 36, '/Top': 814, '/Zoom': 0.0}, '/__WKANCHOR_e': {'/Title': '/__WKANCHOR_e', '/Page': 2, '/Type': '/XYZ', '/Left': 36, '/Top': 703, '/Zoom': 0.0}}

I was able to avoid the error by adding if len(reader.pages) > dest[“/Page”]: on the PdfWriter side.

pypdf/pypdf/_writer.py

Lines 2471 to 2482 in 8f62120

elif isinstance(dest["/Page"], int):
# the page reference is a page number normally not a PDF Reference
# page numbers as int are normally accepted only in external goto
p = reader.pages[dest["/Page"]]
assert p.indirect_reference is not None
try:
arr[NumberObject(0)] = NumberObject(
srcpages[p.indirect_reference.idnum].page_number
)
self.add_named_destination_array(dest["/Title"], arr)
except KeyError:
pass

            elif isinstance(dest["/Page"], int):
                # the page reference is a page number normally not a PDF Reference
                # page numbers as int are normally accepted only in external goto
                if len(reader.pages) > dest["/Page"]:
                    p = reader.pages[dest["/Page"]]
                    assert p.indirect_reference is not None
                    try:
                        arr[NumberObject(0)] = NumberObject(
                            srcpages[p.indirect_reference.idnum].page_number
                        )
                        self.add_named_destination_array(dest["/Title"], arr)
                    except KeyError:
                        pass

A Dests reference is created in the resulting PDF as follows.

15 0 obj
<<
/Dests 16 0 R
>>
endobj
16 0 obj
<<
/Names [ (\057\137\137WKANCHOR\1372) [ 0 /XYZ 36 754 0.0 ] (\057\137\137WKANCHOR\1374) [ 0 /XYZ 305 754 0.0 ] (\057\137\137WKANCHOR\1376) [ 0 /XYZ 36 454 0.0 ] ]
>>

Metadata

Metadata

Assignees

No one assigned

    Labels

    PdfWriterThe PdfWriter component is affectedis-robustness-issueFrom a users perspective, this is about robustness

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions