Skip to content

ROB: reading PDF with multiple forms is really slow #2643

@farjasju

Description

@farjasju

When reading a specific PDF file with a lot of forms, the get_fields() seems to loop for a very long time (one hour and a half on my setup).

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.4.0-173-generic-x86_64-with-glibc2.29

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.2.0, crypt_provider=('pycryptodome', '3.20.0'), PIL=10.3.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader
with open('inheritance.pdf', 'rb') as f:
    doc = PdfReader(f)
    doc.get_fields()

The PDF file (coming from https://www.nj.gov/treasury/taxation/pdf/other_forms/inheritance/itrbk.pdf):
inheritance.pdf

Traceback

The get_fields() function seems to make a lot of recursive _check_kids() calls

Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1120, 0, 133206044573808), IndirectObject(1867, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1115, 0, 133206044573808), IndirectObject(1865, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1107, 0, 133206044573808), IndirectObject(1866, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1120, 0, 133206044573808), IndirectObject(1867, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1120, 0, 133206044573808), IndirectObject(1867, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1107, 0, 133206044573808), IndirectObject(1866, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1120, 0, 133206044573808), IndirectObject(1867, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1120, 0, 133206044573808), IndirectObject(1867, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1071, 0, 133206044573808), IndirectObject(1864, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1115, 0, 133206044573808), IndirectObject(1865, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1107, 0, 133206044573808), IndirectObject(1866, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1120, 0, 133206044573808), IndirectObject(1867, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1120, 0, 133206044573808), IndirectObject(1867, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1107, 0, 133206044573808), IndirectObject(1866, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1120, 0, 133206044573808), IndirectObject(1867, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1120, 0, 133206044573808), IndirectObject(1867, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1115, 0, 133206044573808), IndirectObject(1865, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1107, 0, 133206044573808), IndirectObject(1866, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1120, 0, 133206044573808), IndirectObject(1867, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(942, 0, 133206044573808), IndirectObject(922, 0, 133206044573808)]
Recursing down 5 kids: [IndirectObject(1120, 0, 133206044573808), IndirectObject(1867, 0, 133206044573808)]

Metadata

Metadata

Assignees

No one assigned

    Labels

    workflow-formsFrom a users perspective, forms is the affected feature/workflow

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions