Skip to content

ValueError: invalid literal for int() with base 16: b'F:' #2598

@sureshkvl

Description

@sureshkvl

pypdf version: 4.2.0
platform: Linux-6.5.0-1018-oem-x86_64-with-glibc2.35
Python: 3.10.12

Traceback error

File "/home/suresh/venv-lanchain/lib/python3.10/site-packages/pypdf/_page.py", line 2083, in extract_text
    return self._extract_text(
  File "/home/suresh/venv-lanchain/lib/python3.10/site-packages/pypdf/_page.py", line 1804, in _extract_text
    for operands, operator in content.operations:
  File "/home/suresh/venv-lanchain/lib/python3.10/site-packages/pypdf/generic/_data_structures.py", line 1245, in operations
    self._parse_content_stream(BytesIO(b_(self._data)))
  File "/home/suresh/venv-lanchain/lib/python3.10/site-packages/pypdf/generic/_data_structures.py", line 1135, in _parse_content_stream
    operands.append(read_object(stream, None, self.forced_encoding))
  File "/home/suresh/venv-lanchain/lib/python3.10/site-packages/pypdf/generic/_data_structures.py", line 1286, in read_object
    return read_hex_string_from_stream(stream, forced_encoding)
  File "/home/suresh/venv-lanchain/lib/python3.10/site-packages/pypdf/generic/_utils.py", line 29, in read_hex_string_from_stream
    txt += chr(int(x, base=16))
ValueError: invalid literal for int() with base 16: b'F:'


Below is the python script

from pypdf import PdfReader
reader = PdfReader("biology/lebo102.pdf")
page = reader.pages[0]
print(page.extract_text())
page = reader.pages[1]
print(page.extract_text())
page = reader.pages[2]
print(page.extract_text())

The pdf file is attached
lebo102.pdf

Metadata

Metadata

Assignees

No one assigned

    Labels

    genericThe generic submodule is affectedis-robustness-issueFrom a users perspective, this is about robustness

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions