Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Unify code_checks whitespace checking #30755

Merged
merged 108 commits into from
Mar 23, 2020

Conversation

ShaharNaveh
Copy link
Member

@ShaharNaveh ShaharNaveh commented Jan 6, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Unify test cases of #30467 #30708 #30737

@alimcmaster1
Copy link
Member

For #30737. Is it possible to do this via regex? similar to the current code check for context manager:
https://github.com/pandas-dev/pandas/blob/master/ci/code_checks.sh#L157

@ShaharNaveh
Copy link
Member Author

ShaharNaveh commented Jan 6, 2020

For #30737. Is it possible to do this via regex? similar to the current code check for context manager:
https://github.com/pandas-dev/pandas/blob/master/ci/code_checks.sh#L157

Yes, for simple cases like

with pytest.raises(ValueError)

or (For false positives)

with pytest.raises(ValueError, match="foo")

But I want a "global" test case.

So even if there's an edge case, for example:

with pytest.raises(
                       ValueError,
                       # Some comment
)

or (For the case of false positives)

with pytest.raises(
                       ValueError,
                       # Some comment
                       match="foo"
)

Regex is very fast, and for a reason, regex is not looking in the ast. But it's hard for me to predict the future edge cases, and write regex to handle them.

So I wrote a global test case that looks in the ast.

When I detect the occurrence of pytests.raises, I brute-force all the tokens in the logical line until the end of the logical line, and if I find the occurrence of match as a argument (token.NAME) I consider this logical line as good.

My point is, regex is nice for "simple things" but when you have so many edge cases, I'd like to stick to checking it via the ast, because it's hard to check with regex what is a logical line.

P.S
If you're not convinced yet @alimcmaster1 I have some more edge cases that I couldn't found matching regex for them, If you succeed doing so I'd be more than happy to see the solution:)

@alimcmaster1 alimcmaster1 added the CI Continuous Integration label Jan 6, 2020
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you can make the edits in place first without changing files? Typically easier to review that way

If you are moving and modifying things easier to do separately

@datapythonista
Copy link
Member

Agree with Will, do you mind leaving the name unchanged in this PR, even if it's not accurate, so we can review the new changes only. Then it's easy to just rename in a follow up, and get that merged quickly.

Just one early comment, I'm -1 on creating abbreviations like STD. I think it's worth having proper names that can be directly understood, even if they are longer.

@ShaharNaveh
Copy link
Member Author

Any chance you can make the edits in place first without changing files?

Not sure what you mean by that @WillAyd , can you please be more specific?

@datapythonista
Copy link
Member

Check my last comment

@ShaharNaveh
Copy link
Member Author

Check my last comment

So just change the function names, back to the very original?

@datapythonista
Copy link
Member

No, sorry. The file name, so the diff will show what you changed in that file.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice stuff. Some comments, but the general idea looks good.

scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
@ShaharNaveh
Copy link
Member Author

This is... a lot. It'd be great if one of the upstream tools (flake8/black/...) did this, but I'm not sure it belongs here. i.e. if I had to maintain this myself, I would just learn to be OK with the downsides of regexps.

@jbrockmendel I can see flake8 maybe adopting the unconcatnated strings, but for the rest of the tests cases I can't see why a linter would adopt those, as it's a very specific pandas code style/nitpick, maybe a very opinionated linters such as pylint would use those.

(And do you know by any chance someone from flake8 to ask if they would use it?
And I would love to see how they do it, I tried doing this code checks using the ast module, couldn't figure it out :/)

And about the regex, I completely understand why this PR seems like overkill when a simple regex can do the trick, for the most part. but isn't that the fact why we are using tools like black an isort? so there will no questions if the code style right or not, there is always one correct answer and it is what the linter says, when we disagree with the decision of the linter for example in psf/black#1051 we just create our own check (If it's not upstreamed, we do it here).

@datapythonista
Copy link
Member

I agree having these checks in the pandas code base is not ideal. But I'd move forward with this PR, get the validation working better (we already have the validation, this is just improving and refactoring what we've got). And later on, we can consider moving this out of pandas, as we did with the docstrings.

@MomIsBestFriend can you merge master and see if the CI is green please?

@jbrockmendel
Copy link
Member

I'm completely happy deferring to @datapythonista on this. Thanks @MomIsBestFriend for taking this on.

@ShaharNaveh
Copy link
Member Author

Restarting azure

@ShaharNaveh ShaharNaveh reopened this Feb 22, 2020
@ShaharNaveh ShaharNaveh requested a review from WillAyd February 27, 2020 17:01
@ShaharNaveh ShaharNaveh mentioned this pull request Feb 28, 2020
5 tasks
@ShaharNaveh
Copy link
Member Author

ping @datapythonista

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MomIsBestFriend

@WillAyd I think your comments were addressed, can you have a look and merge if you're happy.

@WillAyd
Copy link
Member

WillAyd commented Mar 7, 2020

I will relook in a few days

@datapythonista
Copy link
Member

@MomIsBestFriend sorry we've been slow with this. Do you mind merging master once more please? I'll merge this once CI is green.

@ShaharNaveh
Copy link
Member Author

@datapythonista I just merged master, I'll wait for another commit to be merged, before I can merge master again.

@datapythonista
Copy link
Member

Oh, sorry, I thought the CI was stalled, never mind then. Ping me once it's green if I don't merge it before. Thanks!

@ShaharNaveh
Copy link
Member Author

ping @datapythonista :)

@datapythonista datapythonista merged commit c455606 into pandas-dev:master Mar 23, 2020
@datapythonista
Copy link
Member

Thanks @MomIsBestFriend, great job.

Can you follow up renaming the file please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants