-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-29995: re.escape() now escapes only special characters. #1007
Conversation
@serhiy-storchaka, thanks for your PR! By analyzing the history of the files in this pull request, we identified @birkenfeld, @tiran and @ezio-melotti to be potential reviewers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good to me, as random third party that has written a function like this in the past.
Bunch of teeny inline comments, none of which should block merge.
The automated coverage report failure is bogus, seems to be percentage change from deleting code plus random noise.
# SPECIAL_CHARS | ||
# closing ')', '}' and ']' | ||
# '-' (a range in character set) | ||
# '#' (comment) and WHITESPACE (ignored) in verbose mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own reference, re.VERBOSE
does only look at ascii whitespace:
>>> re.compile("a \f\n\r\v\t\u3000", re.DEBUG|re.VERBOSE)
LITERAL 97
LITERAL 12288
re.compile('a \x0c\n\r\x0b\t\u3000', re.VERBOSE|re.DEBUG)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPECIAL_CHARS
and WHITESPACE
are constants in the sre_parse
module. WHITESPACE
contains only ascii whitespace.
Lib/test/test_re.py
Outdated
self.assertMatch(re.escape(p), p) | ||
for c in '-.]{}': | ||
self.assertEqual(re.escape(c)[:1], '\\') | ||
literal_chars = (string.ascii_letters + string.digits + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This redefines literal_chars
to the same as it was before. Either delete, or split the test, or hoist to a suite level constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I copied this definition rather than move!
Misc/NEWS
Outdated
@@ -303,6 +303,8 @@ Extension Modules | |||
Library | |||
------- | |||
|
|||
- bpo-29995: re.escape() now escapes only special characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To "now escapes only regex special characters" just for clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Lib/test/test_re.py
Outdated
for i in b'-.]{}': | ||
b = bytes([i]) | ||
self.assertEqual(re.escape(b)[:1], b'\\') | ||
literal_chars = ((string.ascii_letters + string.digits).encode() + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To literal_bytes = self.LITERAL_CHARS.encode()
if hoisting above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this looks a good idea.
Doc/library/re.rst
Outdated
@@ -786,13 +786,17 @@ form. | |||
|
|||
.. function:: escape(string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To escape(pattern)
to match implementation naming?
Doc/library/re.rst
Outdated
@@ -786,13 +786,17 @@ form. | |||
|
|||
.. function:: escape(string) | |||
|
|||
Escape all the characters in pattern except ASCII letters, numbers and ``'_'``. | |||
Escape special characters in a string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the style on naming for functions that take either str or bytes is, but can see argument for "...characters in string or bytes pattern".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to avoid the word "pattern". The argument of escape()
is not a pattern, it is an arbitrary string, and escape()
makes a pattern from it.
The name "string" is often used for text and byte strings in this module. It is the name of parameters in many functions and the name of the attribute in the match object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The normal way to write this would be "escape special characters in pattern", since pattern is the formal argument name in the source. That is, I agree with your previous comment that string should be replaced by pattern. The existing docs even use 'pattern' in the body of the text, so I'm not sure how we ended up with 'string' as the formal argument name in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened bpo-30045 for renaming the parameter to "string".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #1048 the string
parameter name was replaced by pattern
and it is referred as *pattern*
in the description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me.
Mentioning bpo-29995 to create a link to it. |
This change is likely implicated in pypa/setuptools#1015. |
@serhiy-storchaka Is the intention that tests like these that are asserting a certain result from |
I think that it would be better to not depend on exact result of
Or test the pattern indirectly. For example, it should match Note that compiling the pattern |
No description provided.