-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Labels
PdfReaderThe PdfReader component is affectedThe PdfReader component is affectedis-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustness
Description
Hi!
Here we are. Again :) Another broken pdf file and stderr are provided below.
P.s. Maybe issue #2841 is similar, but my pdf is not a Trojan (I hope).
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
Linux-5.15.0-56-generic-x86_64-with-glibc2.31
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.3.1, crypt_provider=('cryptography', '3.1'), PIL=none
commit 9d54f63
Code + PDF
This is a minimal, complete example that shows the issue:
#! /usr/bin/env python3
import pypdf
from pypdf.errors import EmptyFileError, PdfReadError, PdfStreamError
import sys
def TestOneInput(fname):
try:
pdf_reader = pypdf.PdfReader(fname)
for page_number, page in enumerate(pdf_reader.pages):
page.extract_text()
except (EmptyFileError, PdfReadError, PdfStreamError):
pass
if __name__ == "__main__":
if len(sys.argv) < 2:
exit(1)
TestOneInput(sys.argv[1])
PoC
crash-2347912aa2a6f0fab5df4ebc8a424735d5d0d128.pdf
Traceback
This is the complete stderr I see:
PdfReadError("Invalid Elementary Object starting with b']' @739: b' %\\n ] %\\n>> '")
Traceback (most recent call last):
File "/fuzz/./poc.py", line 18, in <module>
TestOneInput(sys.argv[1])
File "/fuzz/./poc.py", line 10, in TestOneInput
for page_number, page in enumerate(pdf_reader.pages):
File "/usr/local/lib/python3.9/dist-packages/pypdf/_page.py", line 2447, in __iter__
for i in range(len(self)):
File "/usr/local/lib/python3.9/dist-packages/pypdf/_page.py", line 2372, in __len__
return self.length_function()
File "/usr/local/lib/python3.9/dist-packages/pypdf/_doc_common.py", line 352, in get_num_pages
self._flatten(self._readonly)
File "/usr/local/lib/python3.9/dist-packages/pypdf/_doc_common.py", line 1166, in _flatten
for page in cast(ArrayObject, pages[PA.KIDS]):
File "/usr/local/lib/python3.9/dist-packages/pypdf/generic/_data_structures.py", line 436, in __getitem__
return dict.__getitem__(self, key).get_object()
KeyError: '/Kids'
Metadata
Metadata
Assignees
Labels
PdfReaderThe PdfReader component is affectedThe PdfReader component is affectedis-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustness