-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError: 'int' object has no attribute 'isspace' #1983
Comments
@michelcrypt4d4mus |
Hm, interesting. That works fine:
|
I think the missing feature #1989 might be the issue here. It's just a bit hidden. |
got something else behind too on page 10 |
i linked to this code in my package; let me know if that's not enough. it's simple - just iterate over all pages + all images and extract text. edit: the actual extraction is done with |
What I would like is a standalone code, focusing directly on the page and image which is producing the error. It simplify our analysis not checking everything |
I've finally found the issue. PR is proposed |
Please note that this is potentially backwards-incompatible! This also fixes a bug. Closes #1983
if line.is_space(): Please let me know how to fix it |
@AfifaYousaf |
Tried to extract text from attached PDF.
Environment
Which environment were you using when you encountered the problem?
$ python -m platform macOS-13.4.1-arm64-arm-64bit $ python -c "import pypdf;print(pypdf.__version__)" 3.12.1
Code + PDF
The code is here
PDF is attached. It's public and can be used for tests etc.
New Jersey Coinbase staking securities charges 2023-0606_Coinbase-Penalty-and-C-D.pdf
Traceback
The text was updated successfully, but these errors were encountered: