-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document how to use yapf #2160
Document how to use yapf #2160
Conversation
Test FAILed. |
doc/source/development.rst
Outdated
5. **Autoformatting code**. We use ``yapf`` | ||
https://github.com/google/yapf for linting, and the config file is | ||
located at ``.style.yapf``. We recommend adding ``.travis/yapf.sh`` | ||
to git's ``pre-push`` hook, so that you never have to worry about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why this is so slow, we only have ~90k lines of Python (even including rllib / dataframes), so it seems weird that it takes almost a minute to format that. I don't think we can recommend this unless we get formatting time down to some thing more reasonable, e.g. 5-10 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Takes about 20 seconds on my MacBook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even a minute isn't that bad for pre push since you push far less than you commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It must depend on your hardware then, on my laptop it takes twice a long (4x i7 cores). Still, 20s is an eternity when pushing typically takes just a couple seconds.
Have we considered any faster formatters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only one that's faster is black
, since it checks if the files on disk have been changed.
I would be perfectly happy to switch to it, since it offers no configuration at all, which makes debates about style moot, except I doubt the rest of the team would, given the lack of said configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I've always been doing git status
and then passing all changed files into
yapf --style=.style.yapf -i [all changed files]
And then I run flake8
after that in order to catch remaining errors (like unused imports).
doc/source/development.rst
Outdated
to git's ``pre-push`` hook, so that you never have to worry about | ||
your code formatting ever again. Note that RLlib is not currently | ||
autoformatted, though running ``yapf`` over the individual files you | ||
edit with ``yapf -i -p -r <file1> <file2> <...>`` is a good idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this instead?
yapf -i -p `git diff --name-only upstream/master | grep .py`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it's actually a good idea to run yapf
on the files you edit because that will introduce a lot of changes in places you didn't mean to change and will make it hard to see what your PR is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I see, perhaps the --exclude flags would work here? Though then the cmd line becomes a bit long so you probably want to check in the script.
It would be great to have a ./yapf script that formats all relevant files and also is fast enough to run all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This problem is actually why I've been pushing so annoyingly hard for autoformat everywhere, all the time. If we bit the bullet, formatted all the python code and read the ugly PR, and then put a pre-commit hook (we could copy this script from go), no change in the codebase would ever introduce line noise, because it would always be formatted the same. Since we'd only ever format changed files, it'd be fast enough to be unnoticeable too. It could later open up the possibility of mechanical source transformation.
A pre-push hook is a halfway measure, but is still far better than what's currently done. It also saves having to write overly complex scripts for what's really a problem with standards.
30k lines of ray were autoformatted without any major issues. I think the rest of the python code could be done too. I'd be happy to review, since code formatting hardly requires deep knowledge of the actual code being run.
Any PRs against master
could run yapf in their final commit and then only the diff would appear in the github PR review tool, so it wouldn't be a pain to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can still add a pre-commit hook, just with --excludes for now, no?
FWIW I think the full linting will happen, it's just that we need to find some quiescent time for the sub-projects and then someone can do a quick 3am push to reformat everything.
.travis/yapf.sh
Outdated
|
||
# This flag lints individual files. --files *must* be the first command line | ||
# arg to use this option. | ||
if [[ "$1" == '--files' ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? --files isn't used below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a poor man's argparse since I hate messing with bash's getopts. If --files
is the first argument, then only the files passed are formatted.
The new default is for the script to only updated changed files to encourage using it as a pre-push hook. Travis still checks all since it's not that big an increase to runtime.
@ericl Can you see if the latest commit sped it up at all? Try |
Test PASSed. |
It's still 25 seconds. Why not combine the changed file detection and --all though? That makes it instantaneous for most PRs. Btw you should add a symlink from ~/.travis/yapf.sh to ~/yapf.sh, since the .travis files are hidden for most developers. |
I guess it makes sense to format just the files that differ from `master`,
since we almost always end up merging into there anyway. That's few enough
files to be instant, so it does double duty as a pre-push and pre-commit
hook.
|
Hidden directories may get glossed over otherwise.
They are symlinks to the same thing, but `scripts` is more dev-friendly, while `.travis` is really only for Travis CI.
Most devs will only need `format_changed`, and this is run by default. `format_changed` should be fast enough in most cases to work as a pre-commit hook.
@ericl The latest script should do what you want, can you test it out? |
1. Mention how yapf can be used a pre-commit hook 2. rm `bash`, script is executable
.travis/yapf.sh
Outdated
git diff --name-only HEAD~1 HEAD | grep '\.py$' | xargs -P 5 \ | ||
yapf \ | ||
--style "$ROOT/.style.yapf" \ | ||
--in-place --recursive --parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing --excludes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, didn't push the actual latest version, that's changed. I'd really rather avoid adding an --excludes flag. If someone wants to lint specific files/dirs, they can run yapf.sh --files <files or dirs>
.
.travis/yapf.sh
Outdated
lint_all() { | ||
format_changed() { | ||
# Formats python files that changed in last commit | ||
git diff --name-only HEAD~1 HEAD | grep '\.py$' | xargs -P 5 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite right. Usually you want to diff from the last master
commit, not the one previous.
doc/source/development.rst
Outdated
your code formatting ever again. Note that RLlib is not currently | ||
autoformatted, though running ``yapf`` over the individual files you | ||
edit with ``.travis/yapf.sh --files <file1> <file2> <...>`` is a good idea. | ||
5. **Autoformatting code**. We use ``yapf`` https://github.com/google/yapf for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5. **Autoformatting code**. We use ``yapf`` https://github.com/google/yapf for
linting, and the config file is located at ``.style.yapf``. We recommend
running ``.travis/yapf.sh`` prior to pushing to format changed files.
Note that some projects such as dataframes and rllib are currently excluded.
Can you make the above workflow work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should with the latest change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed some suggested changes. Let me know if I'm missing anything.
.travis/yapf.sh
Outdated
--exclude 'python/ray/core/src/ray/gcs' \ | ||
--exclude 'python/ray/common/thirdparty'" | ||
|
||
UPSTREAM_MASTER=${RAY_UPSTREAM_BRANCH:-upstream/master} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need an env variable, and upstream/master
isn't always defined (my remotes are alok
and origin
). Setting this to master
or origin/master
is a better idea because master
is meant to track the main branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to origin/master
. Note that you can override the target here by setting RAY_UPSTREAM_BRANCH
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually there's a better way to do this... WIP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok... now we always fetch the latest upstream, and grab the merge base, so the diff should always be correct.
.travis/yapf.sh
Outdated
# Format files that differ from main branch | ||
format_changed() { | ||
FILES=`git diff --name-only $UPSTREAM_MASTER -- '*.py'` | ||
yapf $YAPF_FLAGS $YAPF_EXCLUDES -- $FILES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should not exclude anything, since if you want that sort of fine-grained control over what you format, you can look at the output from git status
manually and just pass those files to format
. This is meant to be a quick way to format all files that differ from the main branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to do that, you can run with --files. Consider the common use case. Would a user be surprised if the auto-formatter formatted a file that is excluded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. However, the if statement guard is necessary since otherwise -- $FILES
will expand to --
, which often causes problems. It only works because yapf's -i
flag errors out with empty filenames.
I don't think adding variables like that is a good idea. |
An typical developer trying to use the yapf script isn't going to consider these things; they just want to run it and have it do the right thing, unconditionally. Having modes that don't obey excludes is just complicating the situation. Note that --files will ignore the excludes since in that case the user is making their intent clear. |
Test FAILed. |
Test PASSed. |
Test PASSed. |
Playing fast and loose with whitespace in bash is a terrible idea.
.travis/yapf.sh
Outdated
|
||
# Format files that differ from main branch | ||
format_changed() { | ||
FILES=`git diff --name-only $UPSTREAM_MASTER -- '*.py'` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericl By the way, the backtick syntax is discouraged in all shells these days, $()
is the correct form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
.travis/yapf.sh
Outdated
# Format files that differ from main branch | ||
format_changed() { | ||
FILES=`git diff --name-only $UPSTREAM_MASTER -- '*.py'` | ||
yapf $YAPF_FLAGS $YAPF_EXCLUDES -- $FILES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. However, the if statement guard is necessary since otherwise -- $FILES
will expand to --
, which often causes problems. It only works because yapf's -i
flag errors out with empty filenames.
.travis/yapf.sh
Outdated
--exclude 'python/ray/common/thirdparty'" | ||
|
||
# Add the upstream branch if it doesn't exist | ||
if [ ! -e $ROOT/.git/refs/remotes/upstream ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericl FYI, you should use [[
in bash over [
, since [[
can warn of syntax errors because it's part of the syntax rather than an external command.
Normally, the remote is called `origin`, but naming it explicit
Speaking of shell script best practices, @mehrdadn showed me https://github.com/koalaman/shellcheck recently. It integrates with various editors which is pretty nice. |
@@ -0,0 +1 @@ | |||
.travis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was probably added by accident, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is on purpose. scripts is (currently) a symlink for .travis that can later be pulled out (without much difficulty) into a proper repo for one-off dev/user scripts. All of those scripts are in .travis right now, so I just symlinked it.
@robertnishihara Yeah I use shellcheck for every line of shell I write, plus shellharden. |
Test PASSed. |
.travis/yapf.sh
Outdated
format_changed() { | ||
if ! git diff --diff-filter=ACM --quiet --exit-code 'upstream/master' 'HEAD' -- '*.py' &>dev/null; then | ||
git diff --name-only 'upstream/master' 'HEAD' -- '*.py' | xargs -P 5 \ | ||
yapf "${YAPF_EXCLUDES[@]}" "${YAPF_FLAGS[@]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need the xargs here when the yapf flag includes --parallel?
.travis/yapf.sh
Outdated
# for autoformat yet. | ||
format_changed() { | ||
if ! git diff --diff-filter=ACM --quiet --exit-code 'upstream/master' 'HEAD' -- '*.py' &>dev/null; then | ||
git diff --name-only 'upstream/master' 'HEAD' -- '*.py' | xargs -P 5 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be diffing against the mergebase? Have you checked what happens if master gets ahead of the current branch?
.travis/yapf.sh
Outdated
# Format files that differ from main branch. Ignores dirs that are not slated | ||
# for autoformat yet. | ||
format_changed() { | ||
if ! git diff --diff-filter=ACM --quiet --exit-code 'upstream/master' 'HEAD' -- '*.py' &>dev/null; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid the double diff here. Just pipe the output directly into yapf.
.travis/yapf.sh
Outdated
} | ||
|
||
# Formats *all* files that differ from main branch. | ||
format_all_changed() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unused. Please remove.
elif [[ "$1" == '--all' ]]; then | ||
format_all | ||
else | ||
# Format only the files that changed in last commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed from the upstream mergebase
Remove unused function and move around code to make clearer and adding lines give cleaner diffs.
Test PASSed. |
Each arg has to be in its own set of quotes.
Test FAILed. |
bump @ericl |
.travis/yapf.sh
Outdated
# `diff-filter=ACM` is to ensure we only format files that exist on both | ||
# branches. | ||
if ! git diff --diff-filter=ACM --quiet --exit-code 'upstream/master' -- '*.py' &>dev/null; then | ||
git diff --name-only --diff-filter=ACM 'upstream/master' -- '*.py' | yapf "${YAPF_EXCLUDES[@]}" "${YAPF_FLAGS[@]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I ran this, yapf complained with
yapf: error: cannot use --in-place or --diff flags when reading from stdin
This seemed to be fixed if you do
yapf "${YAPF_EXCLUDES[@]}" "${YAPF_FLAGS[@]}" `git diff --name-only --diff-filter=ACM 'upstream/master' -- '*.py'`
instead
.travis/yapf.sh
Outdated
# | ||
# `diff-filter=ACM` is to ensure we only format files that exist on both | ||
# branches. | ||
if ! git diff --diff-filter=ACM --quiet --exit-code 'upstream/master' -- '*.py' &>dev/null; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/dev/null
, not dev/null
. The latter doesn't run on my machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, that's a typo.
.travis/yapf.sh
Outdated
# could cause yapf to receive 0 positional arguments, making it hang | ||
# waiting for STDIN. | ||
# | ||
# `diff-filter=ACM` is to ensure we only format files that exist on both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACM is not sufficient to ensure the diff is from the correct mergebase. You can do a simple experiment to verify this.
- check out a really old Ray branch that just changes 1-2 files
- run yapf
- observe it try to format dozens of files.
If you set the mergebase correctly, it will only format those 1-2 files. Please revert this to the mergebase code.
.travis/yapf.sh
Outdated
fi | ||
|
||
# Only fetch master since that's the branch we're diffing against. | ||
git fetch 'upstream' 'master' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You don't need single quotes here around upstream or master.
|
||
# Add the upstream branch if it doesn't exist | ||
if ! [[ -e "$ROOT/.git/refs/remotes/upstream" ]]; then | ||
git remote add 'upstream' 'https://github.com/ray-project/ray.git' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you don't need single quotes around these literals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put the single quotes to signify that those arguments are not part of the command git remote add
, but that's a stylistic choice, so I'm happy to revert it.
TIL there's a clean syntax for doing that, but it's too clever to belong in a shell script. We use `mapfile -t` to ensure no problems down the line with weird filenames.
Test PASSed. |
Just tried this and it works now. Thanks for fixing it! |
5. **Inspecting Redis shards by hand:** To inspect the primary Redis shard by | ||
5. **Autoformatting code**. We use ``yapf`` https://github.com/google/yapf for | ||
linting, and the config file is located at ``.style.yapf``. We recommend | ||
running ``.travis/yapf.sh`` prior to pushing to format changed files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I guess we could also recommend scripts/yapf.sh, oh well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just hold off until we can look at all the scripts that belong in .travis and that belong in scripts. I made the symlink more for convenience, I don't expect it to stick around.
This is a cleaner version of #2158. @ericl Please take a look at this.