Skip to content

Recursion error when using clone_from of PdfWriter on PDF 2.0 specification #2839

@stefan6419846

Description

@stefan6419846

Environment

$ python -m platform
Linux-6.8.0-100039-tuxedo-x86_64-with-glibc2.35

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.3.1, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=10.3.0

The version effectively is the latest main code.

Code + PDF

This is a minimal, complete example that shows the issue:

>>> from pypdf import PdfWriter
>>> writer = PdfWriter(clone_from='ISO_32000-2-2020_sponsored.pdf')

Using PdfReader and iterating over the pages extracting the text does not fail.

I cannot share the document (1003 pages) here as it is the non-public copy of the PDF 2.0 specification available for free on https://pdfa.org/sponsored-standards/

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/_writer.py", line 233, in __init__
    self.clone_document_from_reader(clone_from)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/_writer.py", line 1150, in clone_document_from_reader
    self.clone_reader_document_root(reader)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/_writer.py", line 1119, in clone_reader_document_root
    self._root_object = reader.root_object.clone(self)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_data_structures.py", line 258, in clone
    d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_data_structures.py", line 369, in _clone
    v.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_base.py", line 274, in clone
    obj.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_data_structures.py", line 258, in clone
    d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_data_structures.py", line 369, in _clone
    v.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_base.py", line 274, in clone
    obj.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_data_structures.py", line 258, in clone
    d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_data_structures.py", line 369, in _clone
    v.clone(pdf_dest, force_duplicate, ignore_fields)
[...]
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_data_structures.py", line 129, in clone
    arr.append(data.clone(pdf_dest, force_duplicate, ignore_fields))
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_base.py", line 274, in clone
    obj.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_data_structures.py", line 258, in clone
    d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_data_structures.py", line 369, in _clone
    v.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_data_structures.py", line 129, in clone
    arr.append(data.clone(pdf_dest, force_duplicate, ignore_fields))
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_base.py", line 274, in clone
    obj.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_data_structures.py", line 258, in clone
    d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_data_structures.py", line 369, in _clone
    v.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_data_structures.py", line 129, in clone
    arr.append(data.clone(pdf_dest, force_duplicate, ignore_fields))
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_base.py", line 266, in clone
    obj = self.get_object()
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_base.py", line 286, in get_object
    return self.pdf.get_object(self)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/_reader.py", line 381, in get_object
    retval = self._get_object_from_stream(indirect_reference)  # type: ignore
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/_reader.py", line 315, in _get_object_from_stream
    obj_stm: EncodedStreamObject = IndirectObject(stmnum, 0, self).get_object()  # type: ignore
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_base.py", line 286, in get_object
    return self.pdf.get_object(self)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/_reader.py", line 442, in get_object
    retval = read_object(self.stream, self)  # type: ignore
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_data_structures.py", line 1305, in read_object
    return DictionaryObject.read_from_stream(stream, pdf, forced_encoding)
  File "/home/stefan/tmp/pypdf/pypdf_upstream/pypdf/generic/_data_structures.py", line 562, in read_from_stream
    if isinstance(length, IndirectObject):
  File "/usr/lib/python3.10/typing.py", line 1503, in __instancecheck__
    issubclass(instance.__class__, cls)):
RecursionError: maximum recursion depth exceeded

Metadata

Metadata

Assignees

No one assigned

    Labels

    PdfWriterThe PdfWriter component is affected

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions