-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-31672: string: Use re.A | re.I
flag for identifier pattern
#3872
Conversation
re.A | re.I
flag for identifier pattern
As documented, identifier should be ASCII. Since we forgot re.A flag, it matched to some non ASCII characters. For backward compatibility, we need to remove re.A flag after pattern is compiled.
b056576
to
c43ec46
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a little weird, but with comments I think it could be okay. I guess I'm a +0 so I'd like to get some other opinions.
Lib/string.py
Outdated
@@ -81,7 +81,7 @@ class Template(metaclass=_TemplateMetaclass): | |||
delimiter = '$' | |||
idpattern = r'[_a-z][_a-z0-9]*' | |||
braceidpattern = None | |||
flags = _re.IGNORECASE | |||
flags = _re.IGNORECASE | _re.ASCII |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a comment here too, since a person reading the code may not notice the restored flag below.
Lib/string.py
Outdated
@@ -157,6 +157,10 @@ def convert(mo): | |||
return self.pattern.sub(convert, self.template) | |||
|
|||
|
|||
# We use re.I | re.A when compiling Template.idpattern, but restore old flag | |||
# for backward compatibility. | |||
Template.flags = _re.IGNORECASE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
We use re.I | re.A while compiling Template.idpattern in the metaclass above, but since
flags is part of the public API, we restore its original documented value for backward
compatibility.
?
Do you think it's worth waiting to see how bpo-31690 resolves? I'd definitely prefer the inline 'a' flag approach instead. |
I agree with you. Pending this pull request until bpo-31690. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth adding a NEWS entry instead of a skip-news label, to pass the bevedere test. Other than that, this looks great, thanks!
@warsaw I added NEWS entry and updated the doc. Would you review it too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks! I noticed a few misspellings, and made some documentation change suggestions.
Also, I'm noticing the tests are failing.
Doc/library/string.rst
Outdated
``[_a-z][_a-z0-9]*``. If this is given and *braceidpattern* is ``None`` | ||
this pattern will also apply to braced placeholders. | ||
``(?-i:[_a-zA-Z][_a-zA-Z0-9]*)``. Since default *flags* is | ||
``re.IGNORECASE``, ``[a-z]``Without local flag ``-i``, is used to avoid to match with non ASCII characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a space before the W
, but I also feel like the sentence starting with "Since default..." should just be moved to, and consolidated with, the note::
section that just follows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed removing this sentence after writing note section.
Doc/library/string.rst
Outdated
.. note:: | ||
|
||
Default *flags* is ``re.IGNORECASE``. So the pattern ``[a-z]`` can match | ||
with some non ASCII characters. That's why We use local ``-i`` flag here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"non-ASCII"
Also s/We/we/
Doc/library/string.rst
Outdated
with some non ASCII characters. That's why We use local ``-i`` flag here. | ||
|
||
When overrinding this class, please consider overriding *flags* with ``0`` | ||
or ``re.IGNORECASE | re.ASCII``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about: "When subclassing, please..."
Why? Or in other words, can you add a short explanation of why they should consider this?
@@ -0,0 +1,2 @@ | |||
``idpattern`` in ``string.Template`` matched some non ASCII characters. Now | |||
it uses ``-i`` regular expression local flag to avoid non ASCII characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"non-ASCII" in two places.
Lib/test/test_string.py
Outdated
@@ -270,6 +270,10 @@ def test_invalid_placeholders(self): | |||
raises(ValueError, s.substitute, dict(who='tim')) | |||
s = Template('$who likes $100') | |||
raises(ValueError, s.substitute, dict(who='tim')) | |||
# Template.idpattern should match to only ASCII characters. | |||
# https://bugs.python.org/issue31672 | |||
s = Template("$who likes $ı") # (0x131, DOTLESS I) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test also 'İ'
(0x130). 'İ'.lower() == 'i'
. In older Python versions [a-z]
didn't match 'ı'
, but matched 'İ'
.
Doc/library/string.rst
Outdated
``None`` this pattern will also apply to braced placeholders. | ||
|
||
.. note:: | ||
|
||
Default *flags* is ``re.IGNORECASE``. So the pattern ``[a-z]`` can match | ||
with some non ASCII characters. That's why We use local ``-i`` flag here. | ||
with some non-ASCII characters. That's why We use local ``-i`` flag here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We -> we
Should I backport this to 3.6? It seems minor enough to not backport. |
We have spent so much time on this issue only to find 3.6-compatible solution. I think that in 3.7-only solution we could break backward compatibility and remove the re.IGNORECASE flag if not found more compatible solution. |
Thanks @methane for the PR 🌮🎉.. I'm working now to backport this PR to: 3.6. |
Sorry, @methane, I could not cleanly backport this to |
…dentifiers (pythonGH-3872) Pattern `[a-z]` with `IGNORECASE` flag can match to some non-ASCII characters. Straightforward solution for this is using `IGNORECASE | ASCII` flag. But users may subclass `Template` and override only `idpattern`. So we want to avoid changing `Template.flags`. So this commit uses local flag `-i` for `idpattern` and change `[a-z]` to `[a-zA-Z]`.. (cherry picked from commit b22273e)
GH-3982 is a backport of this pull request to the 3.6 branch. |
…iers (GH-3872) Pattern `[a-z]` with `IGNORECASE` flag can match to some non-ASCII characters. Straightforward solution for this is using `IGNORECASE | ASCII` flag. But users may subclass `Template` and override only `idpattern`. So we want to avoid changing `Template.flags`. So this commit uses local flag `-i` for `idpattern` and change `[a-z]` to `[a-zA-Z]`. (cherry picked from commit b22273e)
As documented, identifier should be ASCII.
Since we forgot re.A flag, it matched to some non ASCII characters.
For backward compatibility, we need to remove re.A flag after
pattern is compiled.
https://bugs.python.org/issue31672