Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linting to CI via pre-commit #987

Merged
merged 6 commits into from
Jul 31, 2023
Merged

Add linting to CI via pre-commit #987

merged 6 commits into from
Jul 31, 2023

Conversation

hugovk
Copy link
Member

@hugovk hugovk commented Nov 19, 2022

I was going to add this to the forthcoming release cycle (draft) PR, but better to split it out to its own PR.

This adds formatters and linters to the CI via pre-commit, we can adjust these as needed.

Copy link
Member

@ezio-melotti ezio-melotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate (here and/or on comments on the files) on what these two new yaml files accomplishes?

  • lint.yml seems pretty straightforward, but I also noticed that pre-commit/action is apparently deprecated in favor of pre-commit.ci: https://github.com/pre-commit/action
  • pre-commit-config.yaml is used by pre-commit.ci, and possibly by pre-commit/action too. However it's not obvious why it lists seemingly unrelated repos.

Are we using pre-commit.ci or just the action? Is there something else that contributors need to do to enable the pre-commit hooks? Should this be documented somewhere?

@hugovk
Copy link
Member Author

hugovk commented Nov 20, 2022

Can you elaborate (here and/or on comments on the files) on what these two new yaml files accomplishes?

Sure!

lint.yaml

lint.yaml is a GitHub Actions workflow to run the "pre-commit" tool on the CI. The tool's name can be a bit misleading, as it works very well as a tool for pinning and updating linters and so on, even if you don't use it as a local Git pre-commit.

The workflow checks out this repo, sets up a recent Python, and runs the tool via its GitHub Action.


.pre-commit-config.yaml

.pre-commit-config.yaml is the config file for the pre-commit tool.

This lists a number of "hooks", each a separate check, like a linter or formatter:

image

They're listed under their repo. Repos generally have a single check (e.g. isort), although some have multiple (e.g. check-json, check-yaml). We can define optional arguments to pass (such as --target-version=py37 for Black) or additional dependencies (like flake8-2020 for Flake8).

The tool uses repos to fetch hooks rather than installing from PyPI. This is because the tool is cross-platform and can run things written in 20 languages, not just Python.

Importantly, each repo is pinned via a rev version number. This means we don't get any surprises when a tool is updated to add new checks and our CI suddenly fails; we only see those when explicitly updating the pins.

We can periodically update the pins easily using the tool:

python -m pip install pre-commit
pre-commit autoupdate

https://github.com/pre-commit/action says:

DEPRECATED this action is in maintenance-only mode and will not be accepting new features.

Please switch to using pre-commit.ci which is faster and has more features.

However, there's been two maintenance releases since the note was added and I don't the action is doing all we need: install the action, run the tool, do some caching. If it stops working, it's straightforward to replace with our own commands.


Are we using pre-commit.ci or just the action?

Just the action.

The difference:

  • the action runs the tool on GitHub Actions and reports pass/fail.

  • pre-commit.ci runs the tool on its own CI, reports pass/fail, but can also send a fix commit back to PRs for formatting changes. The ci: autoupdate_schedule: quarterly config also defines the period for it to send PRs to update the pins.

In my projects I'm using both:

  • Benefit of the action on GHA CI is it's more likely run for forks, so contributors get fast feedback.

  • Benefit of pre-commit.ci is fixing PRs and periodic autoupdates.

So we could also enable pre-commit.ci for this repo.


  • pre-commit-config.yaml is used by pre-commit.ci, and possibly by pre-commit/action too. However it's not obvious why it lists seemingly unrelated repos.

Yep, both use it. As mentioned above, the repos define where the tool installs each of the formatters/checkers from.


Is there something else that contributors need to do to enable the pre-commit hooks? Should this be documented somewhere?

So for this PR we're using pre-commit as a tool to pin and run linters, not as an actual Git pre-commit tool. But that option is open to developers (and I'll be using it).

I never used to be a fan of using Git pre-commit locally until I found the https://pre-commit.com/ too which makes it much, much easier to manage: no need to manually edit files under the .git dir. Some people really don't like it, even with this tool, which is totally fine! So I'm a bit hesitant if we want to recommend it to contributors.

Anyway, if you do want to use it locally:

# Install the tool
python -m pip install pre-commit

# Run on all files
pre-commit run --all-files

# Update the pins
pre-commit autoupdate

# Activate it for this repo, so it will run on just the staged files when you 'git commit'
pre-commit install

# No longer want it to run for 'git commit'?
pre-commit uninstall

Copy link
Member Author

@hugovk hugovk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's some notes on the hooks I added here.

They're some I commonly use for my projects. We can add or remove as we see fit.

- repo: https://github.com/asottile/pyupgrade
rev: v3.2.2
hooks:
- id: pyupgrade
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyupgrade is a formatter that upgrades to newer Python syntax: https://github.com/asottile/pyupgrade

- repo: https://github.com/psf/black
rev: 22.10.0
hooks:
- id: black
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Black code formatter: https://github.com/psf/black

- repo: https://github.com/PyCQA/isort
rev: 5.10.1
hooks:
- id: isort
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isort to sort imports: https://github.com/PyCQA/isort

Comment on lines +22 to +24
- id: flake8
additional_dependencies: [flake8-2020]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flake8 to find style errors using pycodestyle and pyflakes (both builtin plugins: https://flake8.pycqa.org/en/latest/) and flake8-2020 (version comparison problems: https://github.com/asottile/flake8-2020)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth replacing flake8-2020 with something much more useful, flake8-bugbear. It's more or less the official set of first-party optional checks by the PyCQA that don't meet the strict standards of pyflakes but are still very likely to be mistakes of some sort; I'm not sure I've ever seen a false positive with it. It seems a lot more worth adding than flake8-2020, which is a single check for a single specific issue that is not used here and very unlikely to ever be, at the point.

- repo: https://github.com/pre-commit/pygrep-hooks
rev: v1.9.0
hooks:
- id: python-check-blanket-noqa
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repo has a bunch of simple grep checks. This hook makes sure any # noqa annotations include a code like # noqa: F401:

https://github.com/pre-commit/pygrep-hooks

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
hooks:
- id: check-json
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repo has lots of different checkers. This checks syntax of JSON files (will be added for the release chart PR).

rev: v4.3.0
hooks:
- id: check-json
- id: check-merge-conflict
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checks for accidentally committed merge conflict stuff:

<<<<<<< HEAD
=======
>>>>>>> new_branch_to_merge_later

hooks:
- id: check-json
- id: check-merge-conflict
- id: check-yaml
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checks syntax of YAML files. Can help find mistakes before the CI fails to run.

Comment on lines +36 to +38
- id: end-of-file-fixer
- id: trailing-whitespace
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some whitespace fixers.

Comment on lines +39 to +41
ci:
autoupdate_schedule: quarterly
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defines a schedule for https://pre-commit.ci/ to send autoupdate PRs. I find the weekly default to be too noisy. monthly is also a choice, and these can be updated manually at any time (via the tool or as needed).

@ezio-melotti
Copy link
Member

Thanks for the detailed and very informative reply!
I now have a better understanding of this PR, and I'm mostly fine with it.

The only issue I have is that I'm not a big fan of black, which I assume is responsible for the changes to the two .py files. Given that we don't have much Python code in this repo and since the code looked fine (or even better IMHO) before the changes, perhaps we can skip that? blue might be an alternative if we really want something similar, but I'm not sure what output it will generate.

@terryjreedy
Copy link
Member

Personally, I don't like black's .py changes. They are also a violation of at least the spirit of PEP8.

@CAM-Gerlach
Copy link
Member

To avoid holding up this PR over Black, you could consider dropping it from here and deferring it to a future one. Or, to at least hugely reduce the diff noise to almost nothing (unless I'm missing something), you could disable the quote normalization, which only leaves one single change (using a single-line hanging indent instead of an ugly two-line split in one spot).

@hugovk
Copy link
Member Author

hugovk commented Nov 22, 2022

For comparison I've updated Black to skip string normalisation. How does that look?

And here's how blue looks:

But happy to drop it if you prefer.

@CAM-Gerlach
Copy link
Member

As far as I'm aware, blue @ 88 chars is identical to the current black settings (the two only differ on a few specific points, of course, and the two settings you changed are equally configurable on both black and blue), so I figure if we're going to do one of them, we may as well just use black as it is now—that would be consistent with what we use on the PEPs repo for the infrastructure code.

I don't always 100% agree with every choice Black makes, but the way I see it, as much as I sometimes love to haggle over such minutia (and don't we all?), there are way more important things to worry and debate about than code style, since everyone has different preferences when it comes to that. The advantage of Black is you can just write your code however you want, and it will automatically take care of conforming it to a standard style while avoiding bikeshedding over what color to paint it. I've learned the long-term value of this the hard way (and continue to)...

@ezio-melotti
Copy link
Member

Of the three I like the third the most. Dropping both (black and blue) and leave the code as is is ok too.

Click for bikeshedding aside
  • I prefer single quotes over double quotes (but I'm -0 on a mass quote rewrite and I don't worry too much about (repo-level) consistency since they both have valid uses). They are easier to type and (IMHO) to read.
  • I prefer using
    res = some_func(that, has, many, arguments,
                    on, just, two, lines)
    instead of
    res = some_func(
       that, has, many, arguments, on, three, or, even,
       four, lines
    )
    The former is more compact and easier to read (it's clearer that they are all arguments and not separate expressions in an indented block).
  • I personally use 80 chars, but I can live with 88.

@hugovk
Copy link
Member Author

hugovk commented Nov 23, 2022

As far as I'm aware, blue @ 88 chars is identical to the current black settings (the two only differ on a few specific points, of course, and the two settings you changed are equally configurable on both black and blue), so I figure if we're going to do one of them, we may as well just use black as it is now—that would be consistent with what we use on the PEPs repo for the infrastructure code.

Ah yes! Pretty much, plus some other minor things we're not worried about so it's better to track upstream Black.

Shall we go for what's currently in this PR (Black+88+skip quote normalisation) or drop it altogether?

Or if people aren't keen on a formatter, I'm fine on leaving it out.

Click for bikeshedding aside
  • I'm the opposite on quotes, I find double easier to type and read :)

  • Formatting:

instead of

res = some_func(
   that, has, many, arguments, on, three, or, even,
   four, lines
)

Nitpick: Black won't format like that. Here's an example based on real code. If there's room, it keeps them in one line:

def _carry(value1: float, value2: float, ratio: float) -> tuple[float, float]:
    ...

If too long, it does this:

def _carry(
    value1: float, value2: float, ratio: float, unit: Unit, min_unit: Unit
) -> tuple[float, float]:
    ...

And if still too long, it does this:

def _carry(
    value1: float,
    value2: float,
    ratio: float,
    unit: Unit,
    min_unit: Unit,
    suppress: typing.Iterable[Unit],
) -> tuple[float, float]:
    ...

A benefit is that diffs become clearer, it shows exactly the one new addition. Readability is important and a lot time is spent reading code in diffs, often side-by-side in narrower columns.

There's also some stuff in Black I'm not so keen on, but like CAM said, find there's massive benefit in just letting it take care of all that stuff. Before Black there was yapf, which had so many options it could be hard to settle on which to use :)

@CAM-Gerlach
Copy link
Member

Shall we go for what's currently in this PR (Black+88+skip quote normalisation) or drop it altogether?

FWIW I prefer the former if others don't have strong objections, for the reasons mentions (which gives the same results as Blue for this repo, in particular the "third option" above that Ezio and I both mention we prefer the most of them).

Before Black there was yapf, which had so many options it could be hard to settle on which to use :)

That and yapf used a complex dynamic weighting algorithm that give quite non-deterministic, variable results depending on code's context, which resulted in a lot of fluctuations and diff noise. I also seem to recall some other ancillary issues with it. And before yapf, there was autopep8, which wasn't as much of a comprehensive autoformatter (such that two blocks of CST would be formatted the same) and just strictly enforced PEP 8's guidelines and no more. Surely there were other formatters before that too (Yapf stood for "Yet another Python formatter", after all) but it seems their names have been mostly lost to time.

@ezio-melotti
Copy link
Member

I recently contributed to another repo that uses pre-commit and ran into two problems while trying to fix some CI failures:

  1. It is not clear what command/arguments were used. I got errors from both black and pylint1, and I wanted to run those two checks locally, but the CI output didn't include the exact commands.
  2. After installing pylint locally and getting different errors, I noticed that in the .pre-commit-config.yaml an old version of pylint was specified. This raises the question: will we need to manually (or perhaps automatically, e.g. through @dependabot) update all these versions?

Footnotes

  1. anecdotally one of the correction made by black was wrong (https://github.com/psf/black/issues/1327 -- the others were just quote swaps), and the errors/reported by pylint were false positives, resulting in time wasted and some extra cruft added to the code solely to make the tools happy.

@hugovk
Copy link
Member Author

hugovk commented Dec 9, 2022

I recently contributed to another repo that uses pre-commit and ran into two problems while trying to fix some CI failures:

  1. It is not clear what command/arguments were used. I got errors from both black and pylint1, and I wanted to run those two checks locally, but the CI output didn't include the exact commands.

Did you run the tools directly? e.g. black . and pylint?

With pre-commit, the magic command is pre-commit run --all-files. This installs the pinned versions and runs with the pinned arguments. No "works on my machine/CI" - we all use the same tools, versions and arguments.

  1. After installing pylint locally and getting different errors, I noticed that in the .pre-commit-config.yaml an old version of pylint was specified. This raises the question: will we need to manually (or perhaps automatically, e.g. through @dependabot) update all these versions?

Yes.

Pinning means we don't run into sudden failures, for example, when a linter adds a new check, and means we can deal with those separately instead of in "normal" PRs.

Yes, we need to update those from time to time. Either manually (pre-commit autoupdate) or automatically. Dependabot doesn't update .pre-commit-config.yaml, but https://pre-commit.ci/ does.

A second benefit of https://pre-commit.ci: if I submit a PR and don't apply Black formatting, the CI will do it and update the PR automatically. It saves a lot of manual work for reviewer and contributor: you don't need to tell me what to do (manual formatting or how to install and run extra tools), and I don't need to do it.


Footnotes

  1. anecdotally one of the correction made by black was wrong (psf/black/issues/1327 -- the others were just quote swaps), and the errors/reported by pylint were false positives, resulting in time wasted and some extra cruft added to the code solely to make the tools happy.

Yeah, pylint is pretty noisy, I very occasionally run it locally and don't use it any CI.

@ezio-melotti
Copy link
Member

Did you run the tools directly? e.g. black . and pylint?

Yes. The repo had a bunch of other checks, and I thought it would be easier to just install those two.

A second benefit of https://pre-commit.ci/: if I submit a PR and don't apply Black formatting, the CI will do it and update the PR automatically.

In my specific case, black would have "broken" the code, requiring an additional commit to fix it again. Maybe overall it saves time, even taking into account these issues.

@CAM-Gerlach
Copy link
Member

Yes. The repo had a bunch of other checks, and I thought it would be easier to just install those two.

If you use pre-commit run --all check-name, it will only automatically install the check(s) you specify, and will cache them between runs and between on-demand pre-commit run and automatic pre-commit hooks, if you've enabled them with pre-commit install.

Yes, we need to update those from time to time. Either manually (pre-commit autoupdate) or automatically. Dependabot doesn't update .pre-commit-config.yaml, but pre-commit.ci does.

Its also very straightforward to just have a monthly, quarterly, etc. GitHub action that just runs pre-commit autoupdate and opens a PR with the changes, without any need for relying on a third party cloud service and its limitations. See example here.

I typically use such automation on repos with a handful of relatively straightforward checks. With more tools in play, I find that given tool updates often require manual fixes, introduce new checks/settings or have other non-trivial changes, on most of my repos I run pre-commit autoupdate myself locally quarterly, check the changelog of the updated checks and make any other fixes, changes or adjustments in the checks.

A second benefit of pre-commit.ci: if I submit a PR and don't apply Black formatting, the CI will do it and update the PR automatically. It saves a lot of manual work for reviewer and contributor: you don't need to tell me what to do (manual formatting or how to install and run extra tools), and I don't need to do it.

That can also be done via a GitHub Action without the need for a third party service and the significant limitations of pre-commit CI; see this example. There's also the brand new pre-commit.ci lite, which allows us to get the same auto-fix features as pre-commit.ci by just adding one extra step in the GitHub Actions workflow added here. Contrary to the name, it is this "lite" version that actually offers the full power of pre-commit, as pre-commit.ci can only execute a limited set of pre-installed checks. It does require enabling the Pre-Commit.ci lite GitHub application for some reason, which we may want to avoid by just implementing it ourselves like in the example above.

Yeah, pylint is pretty noisy, I very occasionally run it locally and don't use it any CI.

I do run Pylint in CI via pre-commit and find it useful, but it does take some customization to turn down or disable some of the checks, especially the refactoring ones. Others often find a more shotgun approach useful, disabling all but the warnings and errors, as well as some categories of the same with a higher false positive ratio.

In my specific case, black would have "broken" the code, requiring an additional commit to fix it again. Maybe overall it saves time, even taking into account these issues.

Personally, I always just pre-commit install locally, so I never commit anything that fails the checks, but it is understandable especially in repos like this that others don't. In this particular case, we could enable --experimental-string-processing, or upgrade to a version where it is the default (if it has been released yet), which would have avoided this.

@pradyunsg
Copy link
Member

pradyunsg commented Jan 31, 2023

Given that this repository does not have a large corpus of code, we've configured black to avoid rewriting quotes to reduce churn, and ~all of the code already formats as black would, does anyone of us think that adopting black here is a blocking concern for this PR?

If no one thinks it's a blocking concern, then let's move forward on this. Otherwise, let's remove black from the list of linters, merge this PR, and argue about black separately.

Either way, we should do an update of the pins before landing this. :)

@CAM-Gerlach
Copy link
Member

FWIW, my take is given that, with quote normalization disabled, there's only a single existing line changed by Black, we should just rebase with main, ensure the checks continue to pass and then merge the PR. If it becomes a problem, we can always disable it later.

Copy link
Member

@CAM-Gerlach CAM-Gerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, FWIW

Copy link
Member

@CAM-Gerlach CAM-Gerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thing here: If we're going to add pre-commit, we should move the Sphinx-Lint check under pre-commit as opposed to being hardcoded into the build workflow.

@hugovk
Copy link
Member Author

hugovk commented Feb 6, 2023

Moved!

@hugovk hugovk merged commit 1cb3c18 into python:main Jul 31, 2023
2 checks passed
@hugovk hugovk deleted the linting branch July 31, 2023 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants