Closed
Description
Bug description
Calling Page.getText('blocks')
on PDFs that contain invalid Python escape sequences (e.g. \
) result in the following warnings:
../fitz/fitz.py:5404: DeprecationWarning: invalid escape sequence '\ '
return _fitz.TextPage_extractBLOCKS(self, lines)
This is a warning now but may or may not be an error in Python 3.10.
To Reproduce (mandatory)
-
Create the following test script and save as
test.py
:import sys import fitz pdf = fitz.open(sys.argv[1]) for page in pdf.pages(): page.getText('blocks')
-
Save the attached file locally
-
Run the script against the file with deprecation warnings enabled:
PYTHONWARNINGS=d python3 test.py test_aafigure.pdf
Expected behavior (optional)
The strings should be marked as rawstring (e.g. r'\ '
) internally or escaped.
Screenshots (optional)
N/A
Your configuration (mandatory)
- Fedora 31 (64 bit)
- Python 3.7.6
- PyMuPDF 1.16.16, wheel
3.7.6 (default, Jan 30 2020, 09:44:41)
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)]
linux
PyMuPDF 1.16.16: Python bindings for the MuPDF 1.16.0 library.
Version date: 2020-03-29 09:44:30.
Built for Python 3.7 on linux (64-bit).
Additional context (optional)
I did try to fix this myself, but I haven't worked with SWIG (or Python bindings to a C lib) before and got lost. Sorry 😞