-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-11874 argparse assertion failure #1826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@wimglenn, thanks for your PR! By analyzing the history of the files in this pull request, we identified @benjaminp, @bethard and @bitdancer to be potential reviewers. |
Lib/argparse.py
Outdated
r'\(.*?\)+$', | ||
r'\[.*?\]+$', | ||
r'\S+', | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: you can use automatic string concatenation (`'ab' 'c' == 'abc'), or a multiline regex with the re.M flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the '|'.join([...])
is OK because it shows the parts of the regex like this OR that OR the_other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Ezio's point was that you can get that benefit without the runtime cost of the join by writing it as:
part_regexp = (
r'\(.*?\)+\s|'
r'\[.*?\]+\s|'
r'\(.*?\)+$|'
r'\[.*?\]+$|'
r'\S+'
)
@ncoghlan Hi Nick, this is the PR I mentioned at PyCon. It seems to have the "awaiting core review" tag already. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR (and it was lovely to meet you at PyCon!). The new test case looks good to me, but I think it may be possible to avoid the new calls to str.strip() by adjusting the way the regex is defined.
Lib/argparse.py
Outdated
r'\(.*?\)+$', | ||
r'\[.*?\]+$', | ||
r'\S+', | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Ezio's point was that you can get that benefit without the runtime cost of the join by writing it as:
part_regexp = (
r'\(.*?\)+\s|'
r'\[.*?\]+\s|'
r'\(.*?\)+$|'
r'\[.*?\]+$|'
r'\S+'
)
@@ -4831,7 +4831,7 @@ def test_nargs_None_metavar_length0(self): | |||
self.do_test_exception(nargs=None, metavar=tuple()) | |||
|
|||
def test_nargs_None_metavar_length1(self): | |||
self.do_test_no_exception(nargs=None, metavar=("1")) | |||
self.do_test_no_exception(nargs=None, metavar=("1",)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear how these string -> tuple changes relate to the rest of the PR. They give the impression that tests needed to be changed in order to avoid test failures (which I don't believe to be the case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I've isolated them in a separate commit - the tests were bogus, not testing what they claimed to be testing, and actually 4 of the assertions were opposite of what they should have been. Just cleaning up whilst I was touching this file anyway.
Lib/argparse.py
Outdated
opt_usage = format(optionals, groups) | ||
pos_usage = format(positionals, groups) | ||
opt_parts = _re.findall(part_regexp, opt_usage) | ||
pos_parts = _re.findall(part_regexp, pos_usage) | ||
opt_parts = [part.strip() for part in opt_parts] | ||
pos_parts = [part.strip() for part in pos_parts] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be desirable to be able to avoid these strip calls by tweaking the way that part_regexp
is defined to look for the trailing whitespace, but not actually capture it. I've added a suggestion above for how to do that with a lookahead assertion.
Lib/argparse.py
Outdated
@@ -325,11 +325,19 @@ def _format_usage(self, usage, actions, groups, prefix): | |||
if len(prefix) + len(usage) > text_width: | |||
|
|||
# break usage into wrappable parts | |||
part_regexp = r'\(.*?\)+|\[.*?\]+|\S+' | |||
part_regexp = '|'.join([ | |||
r'\(.*?\)+\s', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given my comment on the strip calls below, it would be desirable to use a lookahead assertion here so you can require trailing whitespace (or the end of the string) without capturing it in the result. Something like:
r'\(.*?\)+(?=\s|$)'
(Note: I haven't tested that specific regex, but it's hopefully close enough to be a useful starting point)
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Regarding the bedevere/news check, note that this will need a bugfix entry in |
Signed-off-by: Wim Glenn <jump@wimglenn.com>
…ues 24089, 14046, 25058, 11874)
Thanks for making the requested changes! @ncoghlan: please review the changes made to this pull request. |
Travis failure was unrelated, so I've restarted the job. (Note for when merging the commit: due to the squash-based workflow, we'll only have one commit in the actual repo, but given the explanation in the review comments, I'm OK with just noting the extra test fixes in the commit message) |
@ncoghlan Is this ready to merge? |
@ned-deily Indeed - thanks for the reminder! |
GH-7527 is a backport of this pull request to the 3.7 branch. |
GH-7528 is a backport of this pull request to the 3.6 branch. |
GH-7530 is a backport of this pull request to the 3.7 branch. |
(Something odd happened on GH-7527 that led to miss-islington deleting the 3.7 backport branch without merging it first, so I readded the 3.7 label to try again) |
GH-7553 is a backport of this pull request to the 2.7 branch. |
The script above triggers assertion errors from argparse library code (specifically, from here).
This bug that has been floating around for years, and has already bitten several people before it bit me. See for example:
http://bugs.python.org/issue25058
http://bugs.python.org/issue14046
http://bugs.python.org/issue24089
http://bugs.python.org/issue11874
There are a few ways we could deal with it:
(
,[
,)
,]
in metavar strings.part_regexp
pattern.1 - The only consequence here would be that wrapped usage messages might be jacked-up if people use characters like
)
and]
in their metavar strings. But this seems like a sloppy and poor resolution.2 - a sound approach, but it's not backwards compat. Using those troublesome characters in metavars only causes problems when there is line-wrapping, so users may have existing scripts which don't currently hit the issue, and those scripts would suddenly start failing. That's a deal-breaker.
4 - probably the most ideal approach, but it requires rewriting a large chunk of
argparse
code. Some other guys tried this in issue11874. Their patches have just been sitting there for 5 years with little interest, so I'm hesitant to go down that route again.Went with 3 in this PR. The tightened up regex also matches a space (or end) after the closing bracket. In fact it is not possible to solve the underlying issue exactly with regex, since metavar can theoretically be any arbitrary string. After my patch, the
AssertionError
can still fire if a metavar contains a substring like] -v [
. That would be a much rarer/pathological case, unlike the current failure mode which users can and will bump into fairly easily.argparse.py
code is not pretty but the test coverage is quite good (1500+ tests) so I'm confident that improving the regex at least doesn't cause any extra issues.This is the "practicality beats purity" approach.
https://bugs.python.org/issue11874