Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit eaf13bb

Browse files
committedOct 13, 2017
[3.6] bpo-31672: Fix string.Template accidentally matched non-ASCII identifiers (pythonGH-3872)
Pattern `[a-z]` with `IGNORECASE` flag can match to some non-ASCII characters. Straightforward solution for this is using `IGNORECASE | ASCII` flag. But users may subclass `Template` and override only `idpattern`. So we want to avoid changing `Template.flags`. So this commit uses local flag `-i` for `idpattern` and change `[a-z]` to `[a-zA-Z]`.. (cherry picked from commit b22273e)
1 parent fdf151b commit eaf13bb

File tree

4 files changed

+25
-3
lines changed

4 files changed

+25
-3
lines changed
 

‎Doc/library/string.rst

+12-2
Original file line numberDiff line numberDiff line change
@@ -746,8 +746,18 @@ to parse template strings. To do this, you can override these class attributes:
746746

747747
* *idpattern* -- This is the regular expression describing the pattern for
748748
non-braced placeholders (the braces will be added automatically as
749-
appropriate). The default value is the regular expression
750-
``[_a-z][_a-z0-9]*``.
749+
appropriate). The default value is the regular expression
750+
``(?-i:[_a-zA-Z][_a-zA-Z0-9]*)``.
751+
752+
.. note::
753+
754+
Since default *flags* is ``re.IGNORECASE``, pattern ``[a-z]`` can match
755+
with some non-ASCII characters. That's why we use local ``-i`` flag here.
756+
757+
While *flags* is kept to ``re.IGNORECASE`` for backward compatibility,
758+
you can override it to ``0`` or ``re.IGNORECASE | re.ASCII`` when
759+
subclassing. It's simple way to avoid unexpected match like above example.
760+
751761

752762
* *flags* -- The regular expression flags that will be applied when compiling
753763
the regular expression used for recognizing substitutions. The default value

‎Lib/string.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,11 @@ class Template(metaclass=_TemplateMetaclass):
7878
"""A string class for supporting $-substitutions."""
7979

8080
delimiter = '$'
81-
idpattern = r'[_a-z][_a-z0-9]*'
81+
# r'[a-z]' matches to non-ASCII letters when used with IGNORECASE,
82+
# but without ASCII flag. We can't add re.ASCII to flags because of
83+
# backward compatibility. So we use local -i flag and [a-zA-Z] pattern.
84+
# See https://bugs.python.org/issue31672
85+
idpattern = r'(?-i:[_a-zA-Z][_a-zA-Z0-9]*)'
8286
flags = _re.IGNORECASE
8387

8488
def __init__(self, template):

‎Lib/test/test_string.py

+6
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,12 @@ def test_invalid_placeholders(self):
271271
raises(ValueError, s.substitute, dict(who='tim'))
272272
s = Template('$who likes $100')
273273
raises(ValueError, s.substitute, dict(who='tim'))
274+
# Template.idpattern should match to only ASCII characters.
275+
# https://bugs.python.org/issue31672
276+
s = Template("$who likes $\u0131") # (DOTLESS I)
277+
raises(ValueError, s.substitute, dict(who='tim'))
278+
s = Template("$who likes $\u0130") # (LATIN CAPITAL LETTER I WITH DOT ABOVE)
279+
raises(ValueError, s.substitute, dict(who='tim'))
274280

275281
def test_idpattern_override(self):
276282
class PathPattern(Template):
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
``idpattern`` in ``string.Template`` matched some non-ASCII characters. Now
2+
it uses ``-i`` regular expression local flag to avoid non-ASCII characters.

0 commit comments

Comments
 (0)
Please sign in to comment.