Skip to content

BUG: UnboundLocalError when iterating on pages of malformed pdf (with strict=True) #2617

@farjasju

Description

@farjasju

An UnboundLocalError: local variable 'generation' referenced before assignment is raised when iterating on the pages of a malformed pdf (with len(PdfReader.pages) for example), when strict=True.

Environment

$ python -m platform
Linux-5.4.0-173-generic-x86_64-with-glibc2.31

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.2.0, crypt_provider=('pycryptodome', '3.20.0'), PIL=10.3.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader
with open('malformed_pdf.pdf', 'rb') as f:
    doc = PdfReader(f, strict=True)
    len(doc.pages)

The malformed pdf (coming from https://www.columbia.edu/~aw2951/Nations.pdf):
malformed_pdf.pdf

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/home/jules/.local/lib/python3.10/site-packages/pypdf/_page.py", line 2208, in __len__
    return self.length_function()
  File "/home/jules/.local/lib/python3.10/site-packages/pypdf/_doc_common.py", line 353, in get_num_pages
    self._flatten()
  File "/home/jules/.local/lib/python3.10/site-packages/pypdf/_doc_common.py", line 1122, in _flatten
    self._flatten(obj, inherit, **addt)
  File "/home/jules/.local/lib/python3.10/site-packages/pypdf/_doc_common.py", line 1119, in _flatten
    obj = page.get_object()
  File "/home/jules/.local/lib/python3.10/site-packages/pypdf/generic/_base.py", line 284, in get_object
    return self.pdf.get_object(self)
  File "/home/jules/.local/lib/python3.10/site-packages/pypdf/_reader.py", line 416, in get_object
    f"({idnum} {generation})."
UnboundLocalError: local variable 'generation' referenced before assignment

Metadata

Metadata

Assignees

No one assigned

    Labels

    is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFis-robustness-issueFrom a users perspective, this is about robustness

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions