Open
Description
Documentation
It's not immediately obvious how to escape the repl
(replacement for matches) argument to re.sub()
and re.subn()
if repl
is chosen by a potentially hostile actor. Obviously, re.escape()
isn't the answer, as that escapes far too much.
The right answer seems to be escaped_repl = raw_repl.replace(bslash, bslash*2)
where bslash = '\\'
. It might be worth adding this to the documentation.
Here's the code I used to empirically validate the "right answer" given above (checked on Python 3.8 & 3.12):
from __future__ import annotations
import re, sys
def escape_re_sub_repl(repl: str) -> str:
return repl.replace('\\', '\\\\')
def test_escape_re_sub_repl() -> None:
backslash = '\\'
assert len(backslash) == 1
base_regex = 'TARGET'
assert base_regex == re.escape(base_regex)
base_prefix = 'BEFORE:'
base_suffix = ':AFTER'
base_input = f'{base_prefix}{base_regex}{base_suffix}'
base_chars = tuple(chr(p) for p in range(sys.maxunicode + 1))
escaped_chars = tuple(f'{backslash}{c}' for c in base_chars)
test_cases = base_chars + escaped_chars
assert {len(f) for f in test_cases} == {1, 2}
for raw in test_cases:
repl = escape_re_sub_repl(raw)
got, change_count = re.subn(base_regex, repl, base_input)
assert change_count == 1
assert got == f'{base_prefix}{raw}{base_suffix}'
Metadata
Metadata
Assignees
Projects
Status
Todo