Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-32222: Fix pygettext skipping docstrings for funcs with arg typehints #4745

Merged

Conversation

Tobotimus
Copy link
Contributor

@Tobotimus Tobotimus commented Dec 7, 2017

This changes the behaviour of the pygettext TokenEater when it sees a function definition; instead of searching for the colon at the end of the definition, it skips over everything until it sees the closing parenthesis at the end of the parameter list, then looks for the final colon.

https://bugs.python.org/issue32222

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

Thanks again to your contribution and we look forward to looking at it!

@@ -5,7 +5,7 @@
# Minimally patched to make it even more xgettext compatible
# by Peter Funk <pf@artcom-gmbh.de>
#
# 2002-11-22 J�rgen Hermann <jh@web.de>
# 2002-11-22 J�rgen Hermann <jh@web.de>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unwanted changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now reverted.

return
if ttype == tokenize.NAME and tstring in opts.keywords:
self.__state = self.__keywordseen

def __funcseen(self, ttype, tstring, lineno):
# ignore anything until we see the closing parenthesis
if ttype == tokenize.OP and tstring == ')':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the following example?

def foo(bar=()):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point. I will rethink my approach

@serhiy-storchaka serhiy-storchaka added the type-bug An unexpected behavior, bug, or error label Dec 7, 2017
@@ -5,7 +5,7 @@
# Minimally patched to make it even more xgettext compatible
# by Peter Funk <pf@artcom-gmbh.de>
#
# 2002-11-22 J�rgen Hermann <jh@web.de>
# 2002-11-22 Jürgen Hermann <jh@web.de>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that the change here is a re-encoding from Latin-1 to UTF-8. If that is so, please remove the encoding line at the top of the file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to not include any unrelated changes. If you want to change the encoding of the file, it should be done in a separate issue. But non-default encoding may be a part of testing.

Copy link
Contributor Author

@Tobotimus Tobotimus Dec 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it was a change of encoding, thanks for pointing that out; my text editor seems to save files in UTF-8 by default. I've reverted the encoding to ISO 8859-1.

@Tobotimus
Copy link
Contributor Author

I've made a change to account for the following example, provided by serhiy:

def foo(bar=()):

This involves counting the number of nested parentheses until we see the closing parenthesis for the outermost pair (i.e. the parameter list parens).

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Could you please add a test?

Any chance to support return annotations?

@Tobotimus
Copy link
Contributor Author

@serhiy-storchaka I am more than happy to add some tests, I'd just like to note that I cannot see any existing tests for pygettext's actual parsing behaviour in Lib/test/test_tools/test_i18n.py, so I believe these will be the first tests for this behaviour.

From my own tests, return annotations were (and are, following this PR) already supported. There are only two cases where return annotations would cause problems, albeit I think they are very uncommon:

# returning a sliced list
def foo(bar) -> List[1:2]:
    ...
# returning a lambda function
def foo(bar) -> lambda x: x:

In my opinion both of the above examples, despite being valid syntax, don't really have a use case.

@serhiy-storchaka
Copy link
Member

Currently pygettext testing is fragmentary. All existing test were added just as regression tests for fixed bugs. There is the same case.

We can handle most cases by counting parenthesis, brackets and spaces.

def foo(bar) -> List[1:2]:
def foo(bar) -> {1: 2}:
def foo(bar) -> T(lambda x: x):

Only the case with a lambda at upper level is a problem. But there is a simple workaround for this -- parenthesis around a lambda.

@Tobotimus
Copy link
Contributor Author

I have added some tests, and also changed up the logic a bit.

Now, when skipping over a function declaration, it will look for the first colon which isn't surrounded by parentheses, square brackets or braces (I've called them 'enclosures' as an umbrella term here).

@serhiy-storchaka
Copy link
Member

I think this is too much and too small. If we are sure that the parenthesis are balanced (as it is in a syntactically correct Python), we can just count any opening parenthesis as +1 and any closing parenthesis as -1. If we want to check the balance, we should support a stack of parenthesis instead of a simple count (or several counts), and check that any closing parenthesis matches an opening parenthesis. I think the simpler solution is enough.

Parenthesis should also be counted after class:

class C(L[1:2], F({1: 2}), metaclass=M(lambda x: x)):

@Tobotimus
Copy link
Contributor Author

@serhiy-storchaka I'm not sure I understand; this PR does the simpler solution as you say, assuming that the parentheses are balanced, counting +1 when opening and -1 when closing. The only difference in this solution is that it also counts square brackets and braces. Are you suggesting I remove those two extra counts?

Thank you for pointing out the class situation, I will fix that up.

@serhiy-storchaka
Copy link
Member

I suggest to use a single integer counter for all three kinds of parenthesis: round parenthesis, brackets and braces.

@Tobotimus
Copy link
Contributor Author

Of course, now I understand. I don't know why I thought creating three different counters was a good idea in the first place. Thank you for pointing this out

@Tobotimus
Copy link
Contributor Author

The requested change has been made; this PR's changes are actually much simpler now.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, all LGTM now.

And the last: add please your name in Misc/ACKS and append "Patch by ." to the news entry.

@@ -1,2 +1,3 @@
Fix pygettext not extracting docstrings for functions with type annotated
arguments.
Patch by Tobotimus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use your real name. And don't forgot a period at the end.

@Tobotimus
Copy link
Contributor Author

This is ready to merge

@miss-islington
Copy link
Contributor

Thanks @Tobotimus for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.6, 3.7.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 26, 2018
…ints (pythonGH-4745)

(cherry picked from commit eee72d4)

Co-authored-by: Tobotimus <Tobotimus@users.noreply.github.com>
@bedevere-bot
Copy link

GH-5915 is a backport of this pull request to the 3.7 branch.

@bedevere-bot
Copy link

GH-5916 is a backport of this pull request to the 3.6 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 26, 2018
…ints (pythonGH-4745)

(cherry picked from commit eee72d4)

Co-authored-by: Tobotimus <Tobotimus@users.noreply.github.com>
miss-islington added a commit that referenced this pull request Feb 26, 2018
…ints (GH-4745)

(cherry picked from commit eee72d4)

Co-authored-by: Tobotimus <Tobotimus@users.noreply.github.com>
@Tobotimus Tobotimus deleted the 32222-fix-pygettext-annotations branch February 26, 2018 23:25
miss-islington added a commit that referenced this pull request Feb 27, 2018
…ints (GH-4745)

(cherry picked from commit eee72d4)

Co-authored-by: Tobotimus <Tobotimus@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants