Skip to content

DeprecationWarning: invalid escape sequence #482

Closed
@stephenfin

Description

@stephenfin

Bug description

Calling Page.getText('blocks') on PDFs that contain invalid Python escape sequences (e.g. \ ) result in the following warnings:

../fitz/fitz.py:5404: DeprecationWarning: invalid escape sequence '\ '
  return _fitz.TextPage_extractBLOCKS(self, lines)

This is a warning now but may or may not be an error in Python 3.10.

To Reproduce (mandatory)

  1. Create the following test script and save as test.py:

     import sys
     import fitz
    
     pdf = fitz.open(sys.argv[1])
     for page in pdf.pages():
         page.getText('blocks')
    
  2. Save the attached file locally

  3. Run the script against the file with deprecation warnings enabled:

     PYTHONWARNINGS=d python3 test.py test_aafigure.pdf
    

Expected behavior (optional)

The strings should be marked as rawstring (e.g. r'\ ') internally or escaped.

Screenshots (optional)

N/A

Your configuration (mandatory)

  • Fedora 31 (64 bit)
  • Python 3.7.6
  • PyMuPDF 1.16.16, wheel
3.7.6 (default, Jan 30 2020, 09:44:41) 
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] 
 linux 
PyMuPDF 1.16.16: Python bindings for the MuPDF 1.16.0 library.
Version date: 2020-03-29 09:44:30.
Built for Python 3.7 on linux (64-bit).

Additional context (optional)

I did try to fix this myself, but I haven't worked with SWIG (or Python bindings to a C lib) before and got lost. Sorry 😞

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions