Skip to content

Using PdfReader causes a crash #2761

@Avgor46

Description

@Avgor46

Hi!

I've been fuzzing PdfReader with a sydr-fuzz via langchain project and found few errors. The question is should the user handle python errors from the pypdf library or is it a bug in pypdf? The necessary information to reproduce one of them is provided below.

Environment

$ python3 -m platform
Linux-5.15.0-56-generic-x86_64-with-glibc2.31

$ python3 -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.3.0, crypt_provider=('cryptography', '3.1'), PIL=none

Code + PDF

This is a minimal, complete example that shows the issue:

#! /usr/bin/env python3

import pypdf
from pypdf.errors import EmptyFileError, PdfReadError, PdfStreamError
import sys

def TestOneInput(fname):
  try:
    pdf_reader = pypdf.PdfReader(fname)
    for page_number, page in enumerate(pdf_reader.pages):
        page.extract_text()
  except (EmptyFileError, PdfReadError, PdfStreamError):
      pass

if __name__ == "__main__":
    if len(sys.argv) < 2:
        exit(1)
    TestOneInput(sys.argv[1])

PoC

crash-b26d05712a29b241ac6f9dc7fff57428ba2d1a04.pdf

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "/fuzz/./reproducer.py", line 18, in <module>
    TestOneInput(sys.argv[1])
  File "/fuzz/./reproducer.py", line 10, in TestOneInput
    for page_number, page in enumerate(pdf_reader.pages):
  File "/usr/local/lib/python3.9/dist-packages/pypdf/_page.py", line 2296, in __iter__
    for i in range(len(self)):
  File "/usr/local/lib/python3.9/dist-packages/pypdf/_page.py", line 2227, in __len__
    return self.length_function()
  File "/usr/local/lib/python3.9/dist-packages/pypdf/_doc_common.py", line 353, in get_num_pages
    self._flatten()
  File "/usr/local/lib/python3.9/dist-packages/pypdf/_doc_common.py", line 1125, in _flatten
    self._flatten(obj, inherit, **addt)
  File "/usr/local/lib/python3.9/dist-packages/pypdf/_doc_common.py", line 1125, in _flatten
    self._flatten(obj, inherit, **addt)
  File "/usr/local/lib/python3.9/dist-packages/pypdf/_doc_common.py", line 1125, in _flatten
    self._flatten(obj, inherit, **addt)
  [Previous line repeated 986 more times]
  File "/usr/local/lib/python3.9/dist-packages/pypdf/_doc_common.py", line 1122, in _flatten
    obj = page.get_object()
  File "/usr/local/lib/python3.9/dist-packages/pypdf/generic/_base.py", line 284, in get_object
    return self.pdf.get_object(self)
  File "/usr/local/lib/python3.9/dist-packages/pypdf/_reader.py", line 383, in get_object
    retval = self.cache_get_indirect_object(
  File "/usr/local/lib/python3.9/dist-packages/pypdf/_reader.py", line 545, in cache_get_indirect_object
    return self.resolved_objects.get((generation, idnum))
RecursionError: maximum recursion depth exceeded in comparison

Metadata

Metadata

Assignees

No one assigned

    Labels

    PdfMergerThe PdfMerger component is affectedis-robustness-issueFrom a users perspective, this is about robustness

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions