Skip to content

bpo-11874 argparse assertion failure #1826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 8, 2018
Merged

bpo-11874 argparse assertion failure #1826

merged 3 commits into from
Jun 8, 2018

Conversation

wimglenn
Copy link
Contributor

@wimglenn wimglenn commented May 27, 2017

import argparse

parser = argparse.ArgumentParser('prog'*8)
parser.add_argument('--proxy', metavar='<http[s]://example:1234>')

args = parser.parse_args(['--help'])

The script above triggers assertion errors from argparse library code (specifically, from here).

This bug that has been floating around for years, and has already bitten several people before it bit me. See for example:

http://bugs.python.org/issue25058
http://bugs.python.org/issue14046
http://bugs.python.org/issue24089
http://bugs.python.org/issue11874

There are a few ways we could deal with it:

  1. Just remove the asserts, which are bogus.
  2. Disallow using characters (, [, ), ] in metavar strings.
  3. Tighten up the part_regexp pattern.
  4. Don't use regex.

1 - The only consequence here would be that wrapped usage messages might be jacked-up if people use characters like ) and ] in their metavar strings. But this seems like a sloppy and poor resolution.
2 - a sound approach, but it's not backwards compat. Using those troublesome characters in metavars only causes problems when there is line-wrapping, so users may have existing scripts which don't currently hit the issue, and those scripts would suddenly start failing. That's a deal-breaker.
4 - probably the most ideal approach, but it requires rewriting a large chunk of argparse code. Some other guys tried this in issue11874. Their patches have just been sitting there for 5 years with little interest, so I'm hesitant to go down that route again.

Went with 3 in this PR. The tightened up regex also matches a space (or end) after the closing bracket. In fact it is not possible to solve the underlying issue exactly with regex, since metavar can theoretically be any arbitrary string. After my patch, the AssertionError can still fire if a metavar contains a substring like ] -v [. That would be a much rarer/pathological case, unlike the current failure mode which users can and will bump into fairly easily. argparse.py code is not pretty but the test coverage is quite good (1500+ tests) so I'm confident that improving the regex at least doesn't cause any extra issues.

This is the "practicality beats purity" approach.

https://bugs.python.org/issue11874

@mention-bot
Copy link

@wimglenn, thanks for your PR! By analyzing the history of the files in this pull request, we identified @benjaminp, @bethard and @bitdancer to be potential reviewers.

@wimglenn wimglenn changed the title Argparse assertion failure issue11874 argparse assertion failure May 27, 2017
@wimglenn wimglenn changed the title issue11874 argparse assertion failure bpo-11874 argparse assertion failure May 27, 2017
Lib/argparse.py Outdated
r'\(.*?\)+$',
r'\[.*?\]+$',
r'\S+',
])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: you can use automatic string concatenation (`'ab' 'c' == 'abc'), or a multiline regex with the re.M flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the '|'.join([...]) is OK because it shows the parts of the regex like this OR that OR the_other

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Ezio's point was that you can get that benefit without the runtime cost of the join by writing it as:

part_regexp = (
               r'\(.*?\)+\s|'
               r'\[.*?\]+\s|'
               r'\(.*?\)+$|'
               r'\[.*?\]+$|'
               r'\S+'
           )

@wimglenn
Copy link
Contributor Author

@ncoghlan Hi Nick, this is the PR I mentioned at PyCon. It seems to have the "awaiting core review" tag already.

Copy link
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR (and it was lovely to meet you at PyCon!). The new test case looks good to me, but I think it may be possible to avoid the new calls to str.strip() by adjusting the way the regex is defined.

Lib/argparse.py Outdated
r'\(.*?\)+$',
r'\[.*?\]+$',
r'\S+',
])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Ezio's point was that you can get that benefit without the runtime cost of the join by writing it as:

part_regexp = (
               r'\(.*?\)+\s|'
               r'\[.*?\]+\s|'
               r'\(.*?\)+$|'
               r'\[.*?\]+$|'
               r'\S+'
           )

@@ -4831,7 +4831,7 @@ def test_nargs_None_metavar_length0(self):
self.do_test_exception(nargs=None, metavar=tuple())

def test_nargs_None_metavar_length1(self):
self.do_test_no_exception(nargs=None, metavar=("1"))
self.do_test_no_exception(nargs=None, metavar=("1",))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear how these string -> tuple changes relate to the rest of the PR. They give the impression that tests needed to be changed in order to avoid test failures (which I don't believe to be the case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I've isolated them in a separate commit - the tests were bogus, not testing what they claimed to be testing, and actually 4 of the assertions were opposite of what they should have been. Just cleaning up whilst I was touching this file anyway.

Lib/argparse.py Outdated
opt_usage = format(optionals, groups)
pos_usage = format(positionals, groups)
opt_parts = _re.findall(part_regexp, opt_usage)
pos_parts = _re.findall(part_regexp, pos_usage)
opt_parts = [part.strip() for part in opt_parts]
pos_parts = [part.strip() for part in pos_parts]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be desirable to be able to avoid these strip calls by tweaking the way that part_regexp is defined to look for the trailing whitespace, but not actually capture it. I've added a suggestion above for how to do that with a lookahead assertion.

Lib/argparse.py Outdated
@@ -325,11 +325,19 @@ def _format_usage(self, usage, actions, groups, prefix):
if len(prefix) + len(usage) > text_width:

# break usage into wrappable parts
part_regexp = r'\(.*?\)+|\[.*?\]+|\S+'
part_regexp = '|'.join([
r'\(.*?\)+\s',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given my comment on the strip calls below, it would be desirable to use a lookahead assertion here so you can require trailing whitespace (or the end of the string) without capturing it in the result. Something like:

r'\(.*?\)+(?=\s|$)'

(Note: I haven't tested that specific regex, but it's hopefully close enough to be a useful starting point)

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@ncoghlan
Copy link
Contributor

Regarding the bedevere/news check, note that this will need a bugfix entry in Misc/NEWS.d. See https://devguide.python.org/committing/#what-s-new-and-news-entries for more info on how to generate that.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@ncoghlan: please review the changes made to this pull request.

@ncoghlan
Copy link
Contributor

Travis failure was unrelated, so I've restarted the job. (Note for when merging the commit: due to the squash-based workflow, we'll only have one commit in the actual repo, but given the explanation in the review comments, I'm OK with just noting the extra test fixes in the commit message)

@ned-deily
Copy link
Member

@ncoghlan Is this ready to merge?

@ncoghlan ncoghlan merged commit 66f02aa into python:master Jun 8, 2018
@miss-islington
Copy link
Contributor

Thanks @wimglenn for the PR, and @ncoghlan for merging it 🌮🎉.. I'm working now to backport this PR to: 3.6, 3.7.
🐍🍒⛏🤖

@ncoghlan
Copy link
Contributor

ncoghlan commented Jun 8, 2018

@ned-deily Indeed - thanks for the reminder!

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 8, 2018
…GH-1826)

- bugfix and test for fragile metavar handling in argparse (see
  bpo-24089, bpo-14046, bpo-25058, bpo-11874)
- also fixes some incorrect tests that did not make 1-element tuples correctly
(cherry picked from commit 66f02aa)

Co-authored-by: wim glenn <wim.glenn@gmail.com>
@bedevere-bot
Copy link

GH-7527 is a backport of this pull request to the 3.7 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 8, 2018
…GH-1826)

- bugfix and test for fragile metavar handling in argparse (see
  bpo-24089, bpo-14046, bpo-25058, bpo-11874)
- also fixes some incorrect tests that did not make 1-element tuples correctly
(cherry picked from commit 66f02aa)

Co-authored-by: wim glenn <wim.glenn@gmail.com>
@bedevere-bot
Copy link

GH-7528 is a backport of this pull request to the 3.6 branch.

miss-islington added a commit that referenced this pull request Jun 8, 2018
- bugfix and test for fragile metavar handling in argparse (see
  bpo-24089, bpo-14046, bpo-25058, bpo-11874)
- also fixes some incorrect tests that did not make 1-element tuples correctly
(cherry picked from commit 66f02aa)

Co-authored-by: wim glenn <wim.glenn@gmail.com>
@miss-islington
Copy link
Contributor

Thanks @wimglenn for the PR, and @ncoghlan for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 8, 2018
…GH-1826)

- bugfix and test for fragile metavar handling in argparse (see
  bpo-24089, bpo-14046, bpo-25058, bpo-11874)
- also fixes some incorrect tests that did not make 1-element tuples correctly
(cherry picked from commit 66f02aa)

Co-authored-by: wim glenn <wim.glenn@gmail.com>
@bedevere-bot
Copy link

GH-7530 is a backport of this pull request to the 3.7 branch.

@ncoghlan
Copy link
Contributor

ncoghlan commented Jun 8, 2018

(Something odd happened on GH-7527 that led to miss-islington deleting the 3.7 backport branch without merging it first, so I readded the 3.7 label to try again)

miss-islington added a commit that referenced this pull request Jun 8, 2018
- bugfix and test for fragile metavar handling in argparse (see
  bpo-24089, bpo-14046, bpo-25058, bpo-11874)
- also fixes some incorrect tests that did not make 1-element tuples correctly
(cherry picked from commit 66f02aa)

Co-authored-by: wim glenn <wim.glenn@gmail.com>
@wimglenn wimglenn deleted the argparse_assertion_failure branch June 8, 2018 16:09
@miss-islington
Copy link
Contributor

Thanks @wimglenn for the PR, and @ncoghlan for merging it 🌮🎉.. I'm working now to backport this PR to: 2.7.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-7553 is a backport of this pull request to the 2.7 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 9, 2018
…GH-1826)

- bugfix and test for fragile metavar handling in argparse (see
  bpo-24089, bpo-14046, bpo-25058, bpo-11874)
- also fixes some incorrect tests that did not make 1-element tuples correctly
(cherry picked from commit 66f02aa)

Co-authored-by: wim glenn <wim.glenn@gmail.com>
miss-islington added a commit that referenced this pull request Jun 9, 2018
- bugfix and test for fragile metavar handling in argparse (see
  bpo-24089, bpo-14046, bpo-25058, bpo-11874)
- also fixes some incorrect tests that did not make 1-element tuples correctly
(cherry picked from commit 66f02aa)

Co-authored-by: wim glenn <wim.glenn@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants