Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-36390: simplify classifyws() and add unit tests #14500

Merged

Conversation

taleinat
Copy link
Contributor

@taleinat taleinat commented Jul 1, 2019

Re-implement classifyws(s, tabwidth) using a simple regexp and str.expandtabs().

Tests based on work by Cheryl Sabella.

https://bugs.python.org/issue36390

Tests based on work by Cheryl Sabella.
@taleinat taleinat added performance Performance or resource usage tests Tests in the Lib/test dir labels Jul 1, 2019
@taleinat taleinat requested review from terryjreedy and csabella July 1, 2019 09:29
Copy link
Contributor

@aeros aeros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favor of these changes, as it significantly improves the readability of classifyws(). However, on line 1594 of editor.py, the space character should probably be added to the regular expression by changing it from r'[\t]*' to r'[\t ]*' since it was included as a conditional in the original if statement (unless I am misunderstanding something). Also, the s flag looks like it is not necessary, as there is no . being used in the expression. So, in total, I would suggest for it to be changed from re.match(r'[\t]*', s) to re.match(r'[\t ]*).

@taleinat
Copy link
Contributor Author

taleinat commented Jul 1, 2019

@aeros167, I think you missed something; with this change, line 1594 that you mentioned already includes a space:

    m = re.match(r'[ \t]*', s)

@taleinat
Copy link
Contributor Author

taleinat commented Jul 1, 2019

Also, the s flag looks like it is not necessary, as there is no . being used in the expression.

s here is the string in which to search, not a flag (it is not re.S).

@aeros
Copy link
Contributor

aeros commented Jul 1, 2019

@taleinat Oh my apologies, the space before it was a bit hard to see. Is there a way to include the space with something similar to \s? \s also includes tabs, spaces, and newlines, but in this case I don't think you want newlines to be included in the search. Also on the second one, I'm a bit used to the syntax of other regex compilers that include the flag after the expression, should've double checked the python docs for the arguments. Would it make that a bit more clear if the s argument was changed to string or something similar? This might be personal preference, and the original argument was s, but in general I think it looks a bit more clear to avoid the use of one letter variables when possible (except in certain cases, such as i, j, k). I'm not sure what the official standard is in this case, but using descriptive variable names generally improves the readability.

@taleinat
Copy link
Contributor Author

taleinat commented Jul 1, 2019

@aeros167, renaming the "s" function argument is a good idea!

I've just updated this PR, having renamed it "line".

Copy link
Contributor

@aeros aeros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taleinat Good idea with regards to specifically usingline as the variable name, that's a significant improvement over s and it serves a good specific descriptor for its purpose in this method. Approved.

Copy link
Member

@terryjreedy terryjreedy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tal, did the new tests pass with the original version of the function?

See comments on issue RE docstring, name change, tabwidth, and merge order.

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@terryjreedy
Copy link
Member

Kyle, thank you for the useful suggestion. Most of idlelib is essentially private implementation, so API changes are allowed, along with backports. See PEP 434 if curious.

@taleinat
Copy link
Contributor Author

taleinat commented Jul 2, 2019

@terryjreedy, I have made the requested changes; please review again.

Tal, did the new tests pass with the original version of the function?

Indeed, the new tests also pass with the original implementation.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@terryjreedy: please review the changes made to this pull request.

@aeros
Copy link
Contributor

aeros commented Jul 2, 2019

@terryjreedy No problem! So far, I've found code review to be one of my favorite methods of contribution. It helps me to improve my ability to read the code of others (not my strongest area) while also giving back to the Python community. I enjoy engaging in meaningful discussions with other developers and providing constructive feedback.

Out of curiosity, how does the general process of improving readability differ in the standard libraries? I'd imagine the proposals are more scrutinized due to the changes affecting a larger audience. Readability means a lot to me personally, and it seems to be one of the largest advantages that Python has over other programming languages. Is it generally encouraged, or is it done on a more limited basis?

Also, is there a specific intermediate role in the community for code reviewers, similar to the "Python triager" role on the bug tracker? I'm not concerned with the role having different abilities or privileges, it would mostly just be a goal for me to work towards. I have a long term aspiration of eventually becoming a core developer, but that will likely be far off in the distant future. There's a lot for me to improve and work on in the meantime.

Copy link
Contributor

@csabella csabella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The change is much easier to read than the current code and produces the same results. The only thing that I might suggest is to compile the pattern since it's used in a loop.

@aeros
Copy link
Contributor

aeros commented Jul 3, 2019

@csabella For optimal performance, should the regular expression be compiled outside of the bounds of the loop within each of the functions or compiled once and reused as a global variable across editor.py?

@taleinat
Copy link
Contributor Author

taleinat commented Jul 11, 2019

The only thing that I might suggest is to compile the pattern since it's used in a loop.

Done.

FYI, re internally caches the recently used patterns (up to a certain limit), so in a loop like this the patterns aren't actually re-compiled in every iteration.

@miss-islington
Copy link
Contributor

Thanks @taleinat for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Thanks @taleinat for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-14707 is a backport of this pull request to the 3.8 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 11, 2019
…nGH-14500)

(cherry picked from commit 9b5ce62)

Co-authored-by: Tal Einat <taleinat@gmail.com>
@taleinat taleinat deleted the bpo-36390/simplify_and_test_classifyws branch July 11, 2019 14:20
@bedevere-bot
Copy link

GH-14708 is a backport of this pull request to the 3.7 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 11, 2019
…nGH-14500)

(cherry picked from commit 9b5ce62)

Co-authored-by: Tal Einat <taleinat@gmail.com>
taleinat added a commit that referenced this pull request Jul 11, 2019
)

(cherry picked from commit 9b5ce62)

Co-authored-by: Tal Einat <taleinat@gmail.com>
taleinat added a commit that referenced this pull request Jul 11, 2019
)

(cherry picked from commit 9b5ce62)

Co-authored-by: Tal Einat <taleinat@gmail.com>
lisroach pushed a commit to lisroach/cpython that referenced this pull request Sep 10, 2019
@terryjreedy
Copy link
Member

The changes I requested were made, but I cannot change the +- review after the merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage skip news tests Tests in the Lib/test dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants