Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-29995: re.escape() now escapes only special characters. #1007

Merged
merged 6 commits into from
Apr 13, 2017

Conversation

serhiy-storchaka
Copy link
Member

No description provided.

@serhiy-storchaka serhiy-storchaka added the type-feature A feature request or enhancement label Apr 5, 2017
@mention-bot
Copy link

@serhiy-storchaka, thanks for your PR! By analyzing the history of the files in this pull request, we identified @birkenfeld, @tiran and @ezio-melotti to be potential reviewers.

Copy link

@bz2 bz2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good to me, as random third party that has written a function like this in the past.

Bunch of teeny inline comments, none of which should block merge.

The automated coverage report failure is bogus, seems to be percentage change from deleting code plus random noise.

# SPECIAL_CHARS
# closing ')', '}' and ']'
# '-' (a range in character set)
# '#' (comment) and WHITESPACE (ignored) in verbose mode
Copy link

@bz2 bz2 Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own reference, re.VERBOSE does only look at ascii whitespace:

>>> re.compile("a \f\n\r\v\t\u3000", re.DEBUG|re.VERBOSE)
LITERAL 97
LITERAL 12288
re.compile('a \x0c\n\r\x0b\t\u3000', re.VERBOSE|re.DEBUG)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPECIAL_CHARS and WHITESPACE are constants in the sre_parse module. WHITESPACE contains only ascii whitespace.

self.assertMatch(re.escape(p), p)
for c in '-.]{}':
self.assertEqual(re.escape(c)[:1], '\\')
literal_chars = (string.ascii_letters + string.digits +
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This redefines literal_chars to the same as it was before. Either delete, or split the test, or hoist to a suite level constant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I copied this definition rather than move!

Misc/NEWS Outdated
@@ -303,6 +303,8 @@ Extension Modules
Library
-------

- bpo-29995: re.escape() now escapes only special characters.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To "now escapes only regex special characters" just for clarity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for i in b'-.]{}':
b = bytes([i])
self.assertEqual(re.escape(b)[:1], b'\\')
literal_chars = ((string.ascii_letters + string.digits).encode() +
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To literal_bytes = self.LITERAL_CHARS.encode() if hoisting above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this looks a good idea.

@@ -786,13 +786,17 @@ form.

.. function:: escape(string)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To escape(pattern) to match implementation naming?

@@ -786,13 +786,17 @@ form.

.. function:: escape(string)

Escape all the characters in pattern except ASCII letters, numbers and ``'_'``.
Escape special characters in a string.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the style on naming for functions that take either str or bytes is, but can see argument for "...characters in string or bytes pattern".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to avoid the word "pattern". The argument of escape() is not a pattern, it is an arbitrary string, and escape() makes a pattern from it.

The name "string" is often used for text and byte strings in this module. It is the name of parameters in many functions and the name of the attribute in the match object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The normal way to write this would be "escape special characters in pattern", since pattern is the formal argument name in the source. That is, I agree with your previous comment that string should be replaced by pattern. The existing docs even use 'pattern' in the body of the text, so I'm not sure how we ended up with 'string' as the formal argument name in the docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened bpo-30045 for renaming the parameter to "string".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #1048 the string parameter name was replaced by pattern and it is referred as *pattern* in the description.

Copy link

@bz2 bz2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me.

@serhiy-storchaka serhiy-storchaka merged commit 5908300 into python:master Apr 13, 2017
@serhiy-storchaka serhiy-storchaka deleted the re-escape branch April 13, 2017 18:13
@jaraco
Copy link
Member

jaraco commented Apr 15, 2017

Mentioning bpo-29995 to create a link to it.

@jaraco
Copy link
Member

jaraco commented Apr 15, 2017

This change is likely implicated in pypa/setuptools#1015.

@jaraco
Copy link
Member

jaraco commented Apr 15, 2017

@serhiy-storchaka Is the intention that tests like these that are asserting a certain result from re.escape should adapt their expectations based on Python version?

@serhiy-storchaka
Copy link
Member Author

I think that it would be better to not depend on exact result of re.escape. Either use re.escape for calculating the part of the expected result that is out of your full control:

assert get_pattern(l('foo/bar')) == l(re.escape('foo/bar') + r'\Z(?ms)')

Or test the pattern indirectly. For example, it should match 'foo/bar', but not 'foo/bar\n'.

Note that compiling the pattern r'foo/bar\Z(?ms)' emits a deprecation warning in 3.6+ since flags are not at the start of the expression. It is worth to rewrite get_pattern() until it become an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants