Skip to content

Escaping the repl argument to re.sub(), re.subn() #128138

Open
@finite-state-machine

Description

@finite-state-machine

Documentation

It's not immediately obvious how to escape the repl (replacement for matches) argument to re.sub() and re.subn() if repl is chosen by a potentially hostile actor. Obviously, re.escape() isn't the answer, as that escapes far too much.

The right answer seems to be escaped_repl = raw_repl.replace(bslash, bslash*2) where bslash = '\\'. It might be worth adding this to the documentation.

Here's the code I used to empirically validate the "right answer" given above (checked on Python 3.8 & 3.12):

from __future__ import annotations
import re, sys

def escape_re_sub_repl(repl: str) -> str:

    return repl.replace('\\', '\\\\')

def test_escape_re_sub_repl() -> None:

    backslash = '\\'
    assert len(backslash) == 1

    base_regex = 'TARGET'
    assert base_regex == re.escape(base_regex)
    base_prefix = 'BEFORE:'
    base_suffix = ':AFTER'
    base_input = f'{base_prefix}{base_regex}{base_suffix}'

    base_chars = tuple(chr(p) for p in range(sys.maxunicode + 1))
    escaped_chars = tuple(f'{backslash}{c}' for c in base_chars)
    test_cases = base_chars + escaped_chars
    assert {len(f) for f in test_cases} == {1, 2}

    for raw in test_cases:
        repl = escape_re_sub_repl(raw)
        got, change_count = re.subn(base_regex, repl, base_input)
        assert change_count == 1
        assert got == f'{base_prefix}{raw}{base_suffix}'

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions