bpo-11874 argparse assertion failure #1826

wimglenn · 2017-05-27T00:51:36Z

import argparse

parser = argparse.ArgumentParser('prog'*8)
parser.add_argument('--proxy', metavar='<http[s]://example:1234>')

args = parser.parse_args(['--help'])

The script above triggers assertion errors from argparse library code (specifically, from here).

This bug that has been floating around for years, and has already bitten several people before it bit me. See for example:

http://bugs.python.org/issue25058
http://bugs.python.org/issue14046
http://bugs.python.org/issue24089
http://bugs.python.org/issue11874

There are a few ways we could deal with it:

Just remove the asserts, which are bogus.
Disallow using characters (, [, ), ] in metavar strings.
Tighten up the part_regexp pattern.
Don't use regex.

1 - The only consequence here would be that wrapped usage messages might be jacked-up if people use characters like ) and ] in their metavar strings. But this seems like a sloppy and poor resolution.
2 - a sound approach, but it's not backwards compat. Using those troublesome characters in metavars only causes problems when there is line-wrapping, so users may have existing scripts which don't currently hit the issue, and those scripts would suddenly start failing. That's a deal-breaker.
4 - probably the most ideal approach, but it requires rewriting a large chunk of argparse code. Some other guys tried this in issue11874. Their patches have just been sitting there for 5 years with little interest, so I'm hesitant to go down that route again.

Went with 3 in this PR. The tightened up regex also matches a space (or end) after the closing bracket. In fact it is not possible to solve the underlying issue exactly with regex, since metavar can theoretically be any arbitrary string. After my patch, the AssertionError can still fire if a metavar contains a substring like ] -v [. That would be a much rarer/pathological case, unlike the current failure mode which users can and will bump into fairly easily. argparse.py code is not pretty but the test coverage is quite good (1500+ tests) so I'm confident that improving the regex at least doesn't cause any extra issues.

This is the "practicality beats purity" approach.

https://bugs.python.org/issue11874

mention-bot · 2017-05-27T00:51:39Z

@wimglenn, thanks for your PR! By analyzing the history of the files in this pull request, we identified @benjaminp, @bethard and @bitdancer to be potential reviewers.

merwok · 2017-09-27T19:00:11Z

Lib/argparse.py

+                    r'\(.*?\)+$',
+                    r'\[.*?\]+$',
+                    r'\S+',
+                ])


Minor: you can use automatic string concatenation (`'ab' 'c' == 'abc'), or a multiline regex with the re.M flag.

I think the '|'.join([...]) is OK because it shows the parts of the regex like this OR that OR the_other

I believe Ezio's point was that you can get that benefit without the runtime cost of the join by writing it as:

part_regexp = ( r'$.*?$+\s|' r'\[.*?\]+\s|' r'$.*?$+$|' r'\[.*?\]+$|' r'\S+' )

wimglenn · 2018-05-16T16:38:15Z

@ncoghlan Hi Nick, this is the PR I mentioned at PyCon. It seems to have the "awaiting core review" tag already.

ncoghlan

Thanks for the PR (and it was lovely to meet you at PyCon!). The new test case looks good to me, but I think it may be possible to avoid the new calls to str.strip() by adjusting the way the regex is defined.

ncoghlan · 2018-05-20T03:37:08Z

Lib/argparse.py

+                    r'\(.*?\)+$',
+                    r'\[.*?\]+$',
+                    r'\S+',
+                ])


I believe Ezio's point was that you can get that benefit without the runtime cost of the join by writing it as:

part_regexp = ( r'$.*?$+\s|' r'\[.*?\]+\s|' r'$.*?$+$|' r'\[.*?\]+$|' r'\S+' )

ncoghlan · 2018-05-20T03:40:35Z

Lib/test/test_argparse.py

@@ -4831,7 +4831,7 @@ def test_nargs_None_metavar_length0(self):
        self.do_test_exception(nargs=None, metavar=tuple())

    def test_nargs_None_metavar_length1(self):
-        self.do_test_no_exception(nargs=None, metavar=("1"))
+        self.do_test_no_exception(nargs=None, metavar=("1",))


It's not clear how these string -> tuple changes relate to the rest of the PR. They give the impression that tests needed to be changed in order to avoid test failures (which I don't believe to be the case).

That's why I've isolated them in a separate commit - the tests were bogus, not testing what they claimed to be testing, and actually 4 of the assertions were opposite of what they should have been. Just cleaning up whilst I was touching this file anyway.

ncoghlan · 2018-05-20T03:49:54Z

Lib/argparse.py

                opt_usage = format(optionals, groups)
                pos_usage = format(positionals, groups)
                opt_parts = _re.findall(part_regexp, opt_usage)
                pos_parts = _re.findall(part_regexp, pos_usage)
+                opt_parts = [part.strip() for part in opt_parts]
+                pos_parts = [part.strip() for part in pos_parts]


It would be desirable to be able to avoid these strip calls by tweaking the way that part_regexp is defined to look for the trailing whitespace, but not actually capture it. I've added a suggestion above for how to do that with a lookahead assertion.

ncoghlan · 2018-05-20T03:54:39Z

Lib/argparse.py

@@ -325,11 +325,19 @@ def _format_usage(self, usage, actions, groups, prefix):
            if len(prefix) + len(usage) > text_width:

                # break usage into wrappable parts
-                part_regexp = r'\(.*?\)+|\[.*?\]+|\S+'
+                part_regexp = '|'.join([
+                    r'\(.*?\)+\s',


Given my comment on the strip calls below, it would be desirable to use a lookahead assertion here so you can require trailing whitespace (or the end of the string) without capturing it in the result. Something like:

r'$.*?$+(?=\s|$)'

(Note: I haven't tested that specific regex, but it's hopefully close enough to be a useful starting point)

bedevere-bot · 2018-05-20T03:58:06Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

ncoghlan · 2018-05-20T04:00:42Z

Regarding the bedevere/news check, note that this will need a bugfix entry in Misc/NEWS.d. See https://devguide.python.org/committing/#what-s-new-and-news-entries for more info on how to generate that.

Signed-off-by: Wim Glenn <jump@wimglenn.com>

…ues 24089, 14046, 25058, 11874)

bedevere-bot · 2018-05-23T05:33:39Z

Thanks for making the requested changes!

@ncoghlan: please review the changes made to this pull request.

ncoghlan · 2018-05-23T12:40:07Z

Travis failure was unrelated, so I've restarted the job. (Note for when merging the commit: due to the squash-based workflow, we'll only have one commit in the actual repo, but given the explanation in the review comments, I'm OK with just noting the extra test fixes in the commit message)

ned-deily · 2018-06-07T04:39:19Z

@ncoghlan Is this ready to merge?

miss-islington · 2018-06-08T10:12:52Z

Thanks @wimglenn for the PR, and @ncoghlan for merging it 🌮🎉.. I'm working now to backport this PR to: 3.6, 3.7.
🐍🍒⛏🤖

ncoghlan · 2018-06-08T10:13:05Z

@ned-deily Indeed - thanks for the reminder!

…GH-1826) - bugfix and test for fragile metavar handling in argparse (see bpo-24089, bpo-14046, bpo-25058, bpo-11874) - also fixes some incorrect tests that did not make 1-element tuples correctly (cherry picked from commit 66f02aa) Co-authored-by: wim glenn <wim.glenn@gmail.com>

bedevere-bot · 2018-06-08T10:14:00Z

GH-7527 is a backport of this pull request to the 3.7 branch.

…GH-1826) - bugfix and test for fragile metavar handling in argparse (see bpo-24089, bpo-14046, bpo-25058, bpo-11874) - also fixes some incorrect tests that did not make 1-element tuples correctly (cherry picked from commit 66f02aa) Co-authored-by: wim glenn <wim.glenn@gmail.com>

bedevere-bot · 2018-06-08T10:14:59Z

GH-7528 is a backport of this pull request to the 3.6 branch.

- bugfix and test for fragile metavar handling in argparse (see bpo-24089, bpo-14046, bpo-25058, bpo-11874) - also fixes some incorrect tests that did not make 1-element tuples correctly (cherry picked from commit 66f02aa) Co-authored-by: wim glenn <wim.glenn@gmail.com>

miss-islington · 2018-06-08T11:10:16Z

Thanks @wimglenn for the PR, and @ncoghlan for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒⛏🤖

…GH-1826) - bugfix and test for fragile metavar handling in argparse (see bpo-24089, bpo-14046, bpo-25058, bpo-11874) - also fixes some incorrect tests that did not make 1-element tuples correctly (cherry picked from commit 66f02aa) Co-authored-by: wim glenn <wim.glenn@gmail.com>

bedevere-bot · 2018-06-08T11:11:23Z

GH-7530 is a backport of this pull request to the 3.7 branch.

ncoghlan · 2018-06-08T11:11:38Z

(Something odd happened on GH-7527 that led to miss-islington deleting the 3.7 backport branch without merging it first, so I readded the 3.7 label to try again)

- bugfix and test for fragile metavar handling in argparse (see bpo-24089, bpo-14046, bpo-25058, bpo-11874) - also fixes some incorrect tests that did not make 1-element tuples correctly (cherry picked from commit 66f02aa) Co-authored-by: wim glenn <wim.glenn@gmail.com>

miss-islington · 2018-06-09T01:19:11Z

Thanks @wimglenn for the PR, and @ncoghlan for merging it 🌮🎉.. I'm working now to backport this PR to: 2.7.
🐍🍒⛏🤖

bedevere-bot · 2018-06-09T01:19:25Z

GH-7553 is a backport of this pull request to the 2.7 branch.

…GH-1826) - bugfix and test for fragile metavar handling in argparse (see bpo-24089, bpo-14046, bpo-25058, bpo-11874) - also fixes some incorrect tests that did not make 1-element tuples correctly (cherry picked from commit 66f02aa) Co-authored-by: wim glenn <wim.glenn@gmail.com>

- bugfix and test for fragile metavar handling in argparse (see bpo-24089, bpo-14046, bpo-25058, bpo-11874) - also fixes some incorrect tests that did not make 1-element tuples correctly (cherry picked from commit 66f02aa) Co-authored-by: wim glenn <wim.glenn@gmail.com>

the-knights-who-say-ni added the CLA signed label May 27, 2017

wimglenn changed the title ~~Argparse assertion failure~~ issue11874 argparse assertion failure May 27, 2017

wimglenn changed the title ~~issue11874 argparse assertion failure~~ bpo-11874 argparse assertion failure May 27, 2017

wimglenn mentioned this pull request Jun 6, 2017

bpo-30583: s/datetuil/dateutil #1972

Merged

merwok reviewed Sep 27, 2017

View reviewed changes

bedevere-bot added the awaiting core review label Sep 27, 2017

ncoghlan requested changes May 20, 2018

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting core review labels May 20, 2018

ncoghlan added needs backport to 3.6 labels May 20, 2018

wimglenn and others added 3 commits May 23, 2018 00:30

fix some incorrect tests that did not make 1-element tuples correctly

ab9cd7f

Signed-off-by: Wim Glenn <jump@wimglenn.com>

bugfix and test for fragile asserts in argparse library code (see iss…

d536c8f

…ues 24089, 14046, 25058, 11874)

Addressed reviewer comments. Added the NEWS.d entry

0fc1cad

bedevere-bot removed the awaiting changes label May 23, 2018

bedevere-bot added the awaiting change review label May 23, 2018

ncoghlan approved these changes May 23, 2018

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting change review labels May 23, 2018

ncoghlan merged commit 66f02aa into python:master Jun 8, 2018

bedevere-bot removed the awaiting merge label Jun 8, 2018

bedevere-bot removed the needs backport to 3.7 label Jun 8, 2018

bedevere-bot removed the needs backport to 3.6 label Jun 8, 2018

ncoghlan added the needs backport to 3.7 label Jun 8, 2018

bedevere-bot removed the needs backport to 3.7 label Jun 8, 2018

wimglenn deleted the argparse_assertion_failure branch June 8, 2018 16:09

ncoghlan added the needs backport to 2.7 label Jun 9, 2018

bedevere-bot removed the needs backport to 2.7 label Jun 9, 2018

Uh oh!

bpo-11874 argparse assertion failure #1826

bpo-11874 argparse assertion failure #1826

Uh oh!

Conversation

wimglenn commented May 27, 2017 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mention-bot commented May 27, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wimglenn commented May 16, 2018

Uh oh!

ncoghlan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented May 20, 2018

Uh oh!

ncoghlan commented May 20, 2018

Uh oh!

bedevere-bot commented May 23, 2018

Uh oh!

ncoghlan commented May 23, 2018

Uh oh!

ned-deily commented Jun 7, 2018

Uh oh!

miss-islington commented Jun 8, 2018

Uh oh!

ncoghlan commented Jun 8, 2018

Uh oh!

bedevere-bot commented Jun 8, 2018

Uh oh!

bedevere-bot commented Jun 8, 2018

Uh oh!

miss-islington commented Jun 8, 2018

Uh oh!

bedevere-bot commented Jun 8, 2018

Uh oh!

ncoghlan commented Jun 8, 2018

Uh oh!

miss-islington commented Jun 9, 2018

Uh oh!

bedevere-bot commented Jun 9, 2018

Uh oh!

Uh oh!

wimglenn commented May 27, 2017 •

edited by bedevere-bot

Loading