Open
Description
Is it possible to have the Replace function support the group capture in the replacement string ?
In the following dummy example, I want to add a space between letters l and e. It works with the re package but not with normalizers.
import re
from tokenizers import normalizers, Regex
pattern = r"(l)(e)"
replacement = r"\1 \2"
text = "le travail est totalement pénible"
text1 = normalizers.Replace(Regex(pattern), replacement).normalize_str(text)
text2 = re.sub(pattern, replacement, text)
print(f"{text = }")
print(f"{text1 = }")
print(f"{text2 = }")
execution result :
text = 'le travail est totalement pénible'
text1 = '\\1 \\2 travail est tota\\1 \\2ment pénib\\1 \\2'
text2 = 'l e travail est total ement pénibl e'