Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Feb 17, 2025

The check-default-configuration pre-commit added in #46622 is slow and it's not necessary to be run always - only when configuration definition changes. Also it does not really take any files from changed PR as input, so it should be configured to not pass the files to it and to not run parallel instances - because when run with --all-files it will run as many parallel copies of it as many processors you have and they will essentially run the same check.

Also this pre-commit requires breeze image to be present so it should be addded at the end of pre-commit files, so that is not run when breeze image is not built.

All this behaviours have been fixed in this PR. After this change:

  • only one copy of the check is run when this pre-commit runs
  • in local pre-commit will only be run if config.yml changes
  • in canary runs it will always run (with --all-files)
  • it is marked as "breeze image" tests by placing it at the end of pre-commit configuration file

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

The check-default-configuration pre-commit added in apache#46622 is slow
and it's not necessary to be run always - only when configuration
definition changes. Also it does not really take any files from changed
PR as input, so it should be configured to not pass the files to it and
to not run parallel instances - because when run with `--all-files`
it will run as many parallel copies of it as many processors you have
and they will essentially run the same check.

Also this pre-commit requires breeze image to be present so it should
be addded at the end of pre-commit files, so that is not run when
breeze image is not built.

All this behaviours have been fixed in this PR. After this change:

* only one copy of the check is run when this pre-commit runs
* in local pre-commit will only be run if config.yml changes
* in canary runs it will always run (with --all-files)
* it is marked as "breeze image" tests by placing it at the end
  of pre-commit configuration file
@potiuk potiuk requested review from Lee-W, ashb and jedcunningham and removed request for Lee-W February 17, 2025 12:12
@potiuk
Copy link
Member Author

potiuk commented Feb 17, 2025

cc @jason810496 also @Lee-W @jedcunningham -> we should be careful with adding new pre-commits. This small addition in pre-commit-config had almost all performance problems I could think of with pre-commits :D

@potiuk potiuk merged commit 0cd8547 into apache:main Feb 17, 2025
18 checks passed
@potiuk potiuk deleted the speed-up-check-default-configuration-pre-commit branch February 17, 2025 12:18
@potiuk
Copy link
Member Author

potiuk commented Feb 17, 2025

cc: @jason810496 --> You likely did not know that :) but:

  • pre-commit by default runs checks in parallel when you run it on group of files - it will split them in chunks and run as many processes as you have processors (this can be disabled by require_serial)
  • also by default pre-commit will pass the list of files to pre-commit - those files that were locally changed in the PR and match the "files" specification if it is there. We really don't want to run this slow check on every local change - - so setting files: ^airfflow/config_templates/config.yml will only run the tests locally when the config.yml changes
  • in all PRs we run --all-files in CI just to make sure we have no false positives introduced by some of the non-modified files
  • in some PRs that are very small (for example README update) we do not build CI image at all, because we do not need it there, in those PR we automatically skip all pre-commits that are at the end of .pre-commit,yml file after certain comment (we parse it and extract list of "CI image" bound pre-commits). So if we want to add pre-commit that uses ci-image, it should be added at the end

Pre-commit relevant docs: https://pre-commit.com/#creating-new-hooks
Our contributing guide docs explaining all that: https://github.com/apache/airflow/blob/main/contributing-docs/08_static_code_checks.rst

@jason810496
Copy link
Member

Thanks, @potiuk, for the explanation! Indeed, I'm not very familiar with pre-commit and overlooked these details. I'll take a closer look at the pre-commit documentation and the current configuration file.

@potiuk
Copy link
Member Author

potiuk commented Feb 17, 2025

Yeah. We have VERY complex and VERY comprehensive test harness and it has a few "you need to read a few pages few times and get burned once or twice to start to understand how things work"

@Lee-W
Copy link
Member

Lee-W commented Feb 17, 2025

@potiuk Thanks for reminding us! Yep, I should have thought it through. 🤦‍♂️

dantonbertuol pushed a commit to dantonbertuol/airflow that referenced this pull request Feb 17, 2025
The check-default-configuration pre-commit added in apache#46622 is slow
and it's not necessary to be run always - only when configuration
definition changes. Also it does not really take any files from changed
PR as input, so it should be configured to not pass the files to it and
to not run parallel instances - because when run with `--all-files`
it will run as many parallel copies of it as many processors you have
and they will essentially run the same check.

Also this pre-commit requires breeze image to be present so it should
be addded at the end of pre-commit files, so that is not run when
breeze image is not built.

All this behaviours have been fixed in this PR. After this change:

* only one copy of the check is run when this pre-commit runs
* in local pre-commit will only be run if config.yml changes
* in canary runs it will always run (with --all-files)
* it is marked as "breeze image" tests by placing it at the end
  of pre-commit configuration file
@potiuk
Copy link
Member Author

potiuk commented Feb 17, 2025

@potiuk Thanks for reminding us! Yep, I should have thought it through. 🤦‍♂️

I realise that only a handful of people including me live and breathe the CI and dev env of Airflow, so I will keep on reminding and explaining - no worries :)

ntr pushed a commit to ntr/airflow that referenced this pull request Feb 20, 2025
The check-default-configuration pre-commit added in apache#46622 is slow
and it's not necessary to be run always - only when configuration
definition changes. Also it does not really take any files from changed
PR as input, so it should be configured to not pass the files to it and
to not run parallel instances - because when run with `--all-files`
it will run as many parallel copies of it as many processors you have
and they will essentially run the same check.

Also this pre-commit requires breeze image to be present so it should
be addded at the end of pre-commit files, so that is not run when
breeze image is not built.

All this behaviours have been fixed in this PR. After this change:

* only one copy of the check is run when this pre-commit runs
* in local pre-commit will only be run if config.yml changes
* in canary runs it will always run (with --all-files)
* it is marked as "breeze image" tests by placing it at the end
  of pre-commit configuration file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants