-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Labels
genericThe generic submodule is affectedThe generic submodule is affectedis-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustness
Description
pypdf version: 4.2.0
platform: Linux-6.5.0-1018-oem-x86_64-with-glibc2.35
Python: 3.10.12
Traceback error
File "/home/suresh/venv-lanchain/lib/python3.10/site-packages/pypdf/_page.py", line 2083, in extract_text
return self._extract_text(
File "/home/suresh/venv-lanchain/lib/python3.10/site-packages/pypdf/_page.py", line 1804, in _extract_text
for operands, operator in content.operations:
File "/home/suresh/venv-lanchain/lib/python3.10/site-packages/pypdf/generic/_data_structures.py", line 1245, in operations
self._parse_content_stream(BytesIO(b_(self._data)))
File "/home/suresh/venv-lanchain/lib/python3.10/site-packages/pypdf/generic/_data_structures.py", line 1135, in _parse_content_stream
operands.append(read_object(stream, None, self.forced_encoding))
File "/home/suresh/venv-lanchain/lib/python3.10/site-packages/pypdf/generic/_data_structures.py", line 1286, in read_object
return read_hex_string_from_stream(stream, forced_encoding)
File "/home/suresh/venv-lanchain/lib/python3.10/site-packages/pypdf/generic/_utils.py", line 29, in read_hex_string_from_stream
txt += chr(int(x, base=16))
ValueError: invalid literal for int() with base 16: b'F:'
Below is the python script
from pypdf import PdfReader
reader = PdfReader("biology/lebo102.pdf")
page = reader.pages[0]
print(page.extract_text())
page = reader.pages[1]
print(page.extract_text())
page = reader.pages[2]
print(page.extract_text())
The pdf file is attached
lebo102.pdf
Metadata
Metadata
Assignees
Labels
genericThe generic submodule is affectedThe generic submodule is affectedis-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustness