Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to use yapf #2160

Merged
merged 21 commits into from
Jun 6, 2018
Merged

Document how to use yapf #2160

merged 21 commits into from
Jun 6, 2018

Conversation

alok
Copy link
Contributor

@alok alok commented May 30, 2018

This is a cleaner version of #2158. @ericl Please take a look at this.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5703/
Test FAILed.

5. **Autoformatting code**. We use ``yapf``
https://github.com/google/yapf for linting, and the config file is
located at ``.style.yapf``. We recommend adding ``.travis/yapf.sh``
to git's ``pre-push`` hook, so that you never have to worry about
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why this is so slow, we only have ~90k lines of Python (even including rllib / dataframes), so it seems weird that it takes almost a minute to format that. I don't think we can recommend this unless we get formatting time down to some thing more reasonable, e.g. 5-10 seconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Takes about 20 seconds on my MacBook

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even a minute isn't that bad for pre push since you push far less than you commit

Copy link
Contributor

@ericl ericl May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must depend on your hardware then, on my laptop it takes twice a long (4x i7 cores). Still, 20s is an eternity when pushing typically takes just a couple seconds.

Have we considered any faster formatters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only one that's faster is black, since it checks if the files on disk have been changed.

I would be perfectly happy to switch to it, since it offers no configuration at all, which makes debates about style moot, except I doubt the rest of the team would, given the lack of said configuration.

Copy link
Collaborator

@robertnishihara robertnishihara May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I've always been doing git status and then passing all changed files into

yapf --style=.style.yapf -i [all changed files]

And then I run flake8 after that in order to catch remaining errors (like unused imports).

to git's ``pre-push`` hook, so that you never have to worry about
your code formatting ever again. Note that RLlib is not currently
autoformatted, though running ``yapf`` over the individual files you
edit with ``yapf -i -p -r <file1> <file2> <...>`` is a good idea.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this instead?

yapf -i -p `git diff --name-only upstream/master | grep .py`

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's actually a good idea to run yapf on the files you edit because that will introduce a lot of changes in places you didn't mean to change and will make it hard to see what your PR is doing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I see, perhaps the --exclude flags would work here? Though then the cmd line becomes a bit long so you probably want to check in the script.

It would be great to have a ./yapf script that formats all relevant files and also is fast enough to run all the time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem is actually why I've been pushing so annoyingly hard for autoformat everywhere, all the time. If we bit the bullet, formatted all the python code and read the ugly PR, and then put a pre-commit hook (we could copy this script from go), no change in the codebase would ever introduce line noise, because it would always be formatted the same. Since we'd only ever format changed files, it'd be fast enough to be unnoticeable too. It could later open up the possibility of mechanical source transformation.

A pre-push hook is a halfway measure, but is still far better than what's currently done. It also saves having to write overly complex scripts for what's really a problem with standards.

30k lines of ray were autoformatted without any major issues. I think the rest of the python code could be done too. I'd be happy to review, since code formatting hardly requires deep knowledge of the actual code being run.

Any PRs against master could run yapf in their final commit and then only the diff would appear in the github PR review tool, so it wouldn't be a pain to review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can still add a pre-commit hook, just with --excludes for now, no?

FWIW I think the full linting will happen, it's just that we need to find some quiescent time for the sub-projects and then someone can do a quick 3am push to reformat everything.

.travis/yapf.sh Outdated

# This flag lints individual files. --files *must* be the first command line
# arg to use this option.
if [[ "$1" == '--files' ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? --files isn't used below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a poor man's argparse since I hate messing with bash's getopts. If --files is the first argument, then only the files passed are formatted.

alok added 2 commits May 29, 2018 23:26
The new default is for the script to only updated changed files to encourage
using it as a pre-push hook. Travis still checks all since it's not that big an
increase to runtime.
@alok
Copy link
Contributor Author

alok commented May 30, 2018

@ericl Can you see if the latest commit sped it up at all? Try .travis/yapf.sh --all.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5713/
Test PASSed.

@ericl
Copy link
Contributor

ericl commented May 30, 2018

It's still 25 seconds. Why not combine the changed file detection and --all though? That makes it instantaneous for most PRs.

Btw you should add a symlink from ~/.travis/yapf.sh to ~/yapf.sh, since the .travis files are hidden for most developers.

@alok
Copy link
Contributor Author

alok commented May 30, 2018 via email

alok added 3 commits May 30, 2018 10:50
Hidden directories may get glossed over otherwise.
They are symlinks to the same thing, but `scripts` is more dev-friendly, while
`.travis` is really only for Travis CI.
Most devs will only need `format_changed`, and this is run by default.
`format_changed` should be fast enough in most cases to work as a pre-commit
hook.
@alok
Copy link
Contributor Author

alok commented May 30, 2018

@ericl The latest script should do what you want, can you test it out?

alok added 2 commits May 30, 2018 11:01
1. Mention how yapf can be used a pre-commit hook
2. rm `bash`, script is executable
.travis/yapf.sh Outdated
git diff --name-only HEAD~1 HEAD | grep '\.py$' | xargs -P 5 \
yapf \
--style "$ROOT/.style.yapf" \
--in-place --recursive --parallel
Copy link
Contributor

@ericl ericl May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing --excludes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, didn't push the actual latest version, that's changed. I'd really rather avoid adding an --excludes flag. If someone wants to lint specific files/dirs, they can run yapf.sh --files <files or dirs>.

.travis/yapf.sh Outdated
lint_all() {
format_changed() {
# Formats python files that changed in last commit
git diff --name-only HEAD~1 HEAD | grep '\.py$' | xargs -P 5 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite right. Usually you want to diff from the last master commit, not the one previous.

your code formatting ever again. Note that RLlib is not currently
autoformatted, though running ``yapf`` over the individual files you
edit with ``.travis/yapf.sh --files <file1> <file2> <...>`` is a good idea.
5. **Autoformatting code**. We use ``yapf`` https://github.com/google/yapf for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5. **Autoformatting code**. We use ``yapf`` https://github.com/google/yapf for
   linting, and the config file is located at ``.style.yapf``. We recommend
   running ``.travis/yapf.sh`` prior to pushing to format changed files.
   Note that some projects such as dataframes and rllib are currently excluded.

Can you make the above workflow work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should with the latest change.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed some suggested changes. Let me know if I'm missing anything.

.travis/yapf.sh Outdated
--exclude 'python/ray/core/src/ray/gcs' \
--exclude 'python/ray/common/thirdparty'"

UPSTREAM_MASTER=${RAY_UPSTREAM_BRANCH:-upstream/master}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need an env variable, and upstream/master isn't always defined (my remotes are alok and origin). Setting this to master or origin/master is a better idea because master is meant to track the main branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to origin/master. Note that you can override the target here by setting RAY_UPSTREAM_BRANCH.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there's a better way to do this... WIP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok... now we always fetch the latest upstream, and grab the merge base, so the diff should always be correct.

.travis/yapf.sh Outdated
# Format files that differ from main branch
format_changed() {
FILES=`git diff --name-only $UPSTREAM_MASTER -- '*.py'`
yapf $YAPF_FLAGS $YAPF_EXCLUDES -- $FILES
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should not exclude anything, since if you want that sort of fine-grained control over what you format, you can look at the output from git status manually and just pass those files to format. This is meant to be a quick way to format all files that differ from the main branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to do that, you can run with --files. Consider the common use case. Would a user be surprised if the auto-formatter formatted a file that is excluded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. However, the if statement guard is necessary since otherwise -- $FILES will expand to --, which often causes problems. It only works because yapf's -i flag errors out with empty filenames.

@alok
Copy link
Contributor Author

alok commented May 30, 2018

I don't think adding variables like that is a good idea. format is just meant to format the files you pass it with the minimum number of flags set to do it in parallel and to follow the style file. It should not exclude anything passed to it, that should be up to the user. format_all should use those flags, but it's kind of pointless if it's the only function using the exclude.

@ericl
Copy link
Contributor

ericl commented May 30, 2018

An typical developer trying to use the yapf script isn't going to consider these things; they just want to run it and have it do the right thing, unconditionally. Having modes that don't obey excludes is just complicating the situation.

Note that --files will ignore the excludes since in that case the user is making their intent clear.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5722/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5720/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5725/
Test PASSed.

alok added 2 commits May 30, 2018 14:24
Playing fast and loose with whitespace in bash is a terrible idea.
.travis/yapf.sh Outdated

# Format files that differ from main branch
format_changed() {
FILES=`git diff --name-only $UPSTREAM_MASTER -- '*.py'`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericl By the way, the backtick syntax is discouraged in all shells these days, $() is the correct form.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

.travis/yapf.sh Outdated
# Format files that differ from main branch
format_changed() {
FILES=`git diff --name-only $UPSTREAM_MASTER -- '*.py'`
yapf $YAPF_FLAGS $YAPF_EXCLUDES -- $FILES
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. However, the if statement guard is necessary since otherwise -- $FILES will expand to --, which often causes problems. It only works because yapf's -i flag errors out with empty filenames.

.travis/yapf.sh Outdated
--exclude 'python/ray/common/thirdparty'"

# Add the upstream branch if it doesn't exist
if [ ! -e $ROOT/.git/refs/remotes/upstream ]; then
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericl FYI, you should use [[ in bash over [, since [[ can warn of syntax errors because it's part of the syntax rather than an external command.

Normally, the remote is called `origin`, but naming it explicit
@robertnishihara
Copy link
Collaborator

Speaking of shell script best practices, @mehrdadn showed me https://github.com/koalaman/shellcheck recently. It integrates with various editors which is pretty nice.

@@ -0,0 +1 @@
.travis
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was probably added by accident, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is on purpose. scripts is (currently) a symlink for .travis that can later be pulled out (without much difficulty) into a proper repo for one-off dev/user scripts. All of those scripts are in .travis right now, so I just symlinked it.

@alok
Copy link
Contributor Author

alok commented May 30, 2018

@robertnishihara Yeah I use shellcheck for every line of shell I write, plus shellharden.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5734/
Test PASSed.

.travis/yapf.sh Outdated
format_changed() {
if ! git diff --diff-filter=ACM --quiet --exit-code 'upstream/master' 'HEAD' -- '*.py' &>dev/null; then
git diff --name-only 'upstream/master' 'HEAD' -- '*.py' | xargs -P 5 \
yapf "${YAPF_EXCLUDES[@]}" "${YAPF_FLAGS[@]}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need the xargs here when the yapf flag includes --parallel?

.travis/yapf.sh Outdated
# for autoformat yet.
format_changed() {
if ! git diff --diff-filter=ACM --quiet --exit-code 'upstream/master' 'HEAD' -- '*.py' &>dev/null; then
git diff --name-only 'upstream/master' 'HEAD' -- '*.py' | xargs -P 5 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be diffing against the mergebase? Have you checked what happens if master gets ahead of the current branch?

.travis/yapf.sh Outdated
# Format files that differ from main branch. Ignores dirs that are not slated
# for autoformat yet.
format_changed() {
if ! git diff --diff-filter=ACM --quiet --exit-code 'upstream/master' 'HEAD' -- '*.py' &>dev/null; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid the double diff here. Just pipe the output directly into yapf.

.travis/yapf.sh Outdated
}

# Formats *all* files that differ from main branch.
format_all_changed() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused. Please remove.

elif [[ "$1" == '--all' ]]; then
format_all
else
# Format only the files that changed in last commit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed from the upstream mergebase

alok added 3 commits May 31, 2018 14:16
Remove unused function and move around code to make clearer and adding lines
give cleaner diffs.
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5784/
Test PASSed.

Each arg has to be in its own set of quotes.
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5787/
Test FAILed.

@alok
Copy link
Contributor Author

alok commented Jun 3, 2018

bump @ericl

.travis/yapf.sh Outdated
# `diff-filter=ACM` is to ensure we only format files that exist on both
# branches.
if ! git diff --diff-filter=ACM --quiet --exit-code 'upstream/master' -- '*.py' &>dev/null; then
git diff --name-only --diff-filter=ACM 'upstream/master' -- '*.py' | yapf "${YAPF_EXCLUDES[@]}" "${YAPF_FLAGS[@]}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I ran this, yapf complained with

yapf: error: cannot use --in-place or --diff flags when reading from stdin

This seemed to be fixed if you do

yapf "${YAPF_EXCLUDES[@]}" "${YAPF_FLAGS[@]}" `git diff --name-only --diff-filter=ACM 'upstream/master' -- '*.py'`

instead

.travis/yapf.sh Outdated
#
# `diff-filter=ACM` is to ensure we only format files that exist on both
# branches.
if ! git diff --diff-filter=ACM --quiet --exit-code 'upstream/master' -- '*.py' &>dev/null; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/dev/null, not dev/null. The latter doesn't run on my machine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, that's a typo.

.travis/yapf.sh Outdated
# could cause yapf to receive 0 positional arguments, making it hang
# waiting for STDIN.
#
# `diff-filter=ACM` is to ensure we only format files that exist on both
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACM is not sufficient to ensure the diff is from the correct mergebase. You can do a simple experiment to verify this.

  1. check out a really old Ray branch that just changes 1-2 files
  2. run yapf
  3. observe it try to format dozens of files.

If you set the mergebase correctly, it will only format those 1-2 files. Please revert this to the mergebase code.

.travis/yapf.sh Outdated
fi

# Only fetch master since that's the branch we're diffing against.
git fetch 'upstream' 'master'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You don't need single quotes here around upstream or master.


# Add the upstream branch if it doesn't exist
if ! [[ -e "$ROOT/.git/refs/remotes/upstream" ]]; then
git remote add 'upstream' 'https://github.com/ray-project/ray.git'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you don't need single quotes around these literals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the single quotes to signify that those arguments are not part of the command git remote add, but that's a stylistic choice, so I'm happy to revert it.

TIL there's a clean syntax for doing that, but it's too clever to belong in a
shell script.

We use `mapfile -t` to ensure no problems down the line with weird filenames.
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5843/
Test PASSed.

@ericl
Copy link
Contributor

ericl commented Jun 6, 2018

Just tried this and it works now. Thanks for fixing it!

@ericl ericl merged commit 42a9233 into ray-project:master Jun 6, 2018
5. **Inspecting Redis shards by hand:** To inspect the primary Redis shard by
5. **Autoformatting code**. We use ``yapf`` https://github.com/google/yapf for
linting, and the config file is located at ``.style.yapf``. We recommend
running ``.travis/yapf.sh`` prior to pushing to format changed files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess we could also recommend scripts/yapf.sh, oh well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just hold off until we can look at all the scripts that belong in .travis and that belong in scripts. I made the symlink more for convenience, I don't expect it to stick around.

@alok alok deleted the doc-yapf branch June 11, 2018 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants