-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Run mypy checks for full packages in CI #36638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Example failure in main https://github.com/apache/airflow/actions/runs/7433525283/job/20226858854 |
|
I am about to add a permanent check to run such complete mypy check in canary build soon :) - but merging it now should fix the problem in main. |
7b42a80 to
87fd291
Compare
|
Added a separate mypy check that runs a complete mypy check - this way we can prevent similar problems in the future. |
87fd291 to
45a3c4a
Compare
|
I do not really get the understanding for "mypy all" vs. "mypy everything". What is going to be checked pre-commit and what on PR and what on main. If the partly/differential mypy is "flaky"/not fully deterministic, should we not take the safe route and always run mypy over all files? As long as static checks are not the longest running actions and most of the local development can use a mypy cache, what would be the drawback of testing "more" to be on the safe side? |
The difference is how pre-commit works and automatically splits list of files to check between multiple processes. When you run In CI - in every PR we run pre-commit with And due to POSIX specifications and limitations, it cannot run as a single command - because we have too many files in Airflow. The list contains currently 2700 files. There are certain limitations of POSIX command line and if you have 2700 arguments passed to a command, each of them being a path, it's not possible to run it as a single command. So what happens - pre-commit will automatically split it into several commands. Effectively what happens is that the regular mypy-pre-commits when And they way how The new check does something different. It runs ONE check with |
That would make every single commit to run 5-10 minutes to check all the files after you install pre-commit in your local filesystem/git - even with |
Yep. This is what I just recalled to add :) |
|
BTW. We also might want to add some extra instructions on how to run the tests if to locally reproduce it. |
f2c7bec to
02bcb64
Compare
|
Added the instructions. And also added one more thing that was wrong - when we update constraints we need to also make a dependency on the |
02bcb64 to
d703e2d
Compare
|
And added |
d703e2d to
9f50008
Compare
|
Final touches: Added |
MyPy as used in our static checks has slightly different heuristics when running on on individual files and whole packages. This sometimes causes semi-random failures when different set of files is produced when pre-commits split the files between parallel processes. The regular `mypy-*` pre-commits work by passing filenames to mypy checks, and when `--all-files` flag is passed to mypy, this means that 2700 files are passed. In this case pre-commit will split such long list of files to several sequential muypy executions. This is not very good because depending on the list of files passed, mypy can split the list diferently and results will be different when just list of files changes - so mypy might start detecting problems that were not present before. This PR introduces new `mypy` check that runs mypy for packages rather than individual files. We cannot run them for local pre-commit runs, because in many cases, such package based mypy check will run for minutes when a single file changes, due to cache invalidation rules - and we do not want to penalise commits that are changing common airflow code (because such PRs would invalidate a lot of mypy cache every time such common file changes). So we still want to run file-based mypy for local commits. But we do not want to pass 2700 files in CI, rather than that on CI we want to run mypy checks "per package". This PR introduces a new "manual" stage mypy pre-commit check that will run "package" based mypy checks and adds selective check rules that will decide properly when to run such tests and separate, matrix-based CI job that will run such mypy checks - separately for each of the packages: "airflow", "providers", "docs", "dev". Also this job will skip providers checks in non-main branch and will run all tests when "full tests needed" are requested. This PR ignores some errors resulted from 3rd-party libraries used that are randomply appearing when some files are modified (and fixes the current main failures)
9f50008 to
60679f6
Compare
|
A little too much verbosity :) |
MyPy as used in our static checks has slightly different heuristics when running on on individual files and whole packages. This sometimes causes semi-random failures when different set of files is produced when pre-commits split the files between parallel processes. The regular `mypy-*` pre-commits work by passing filenames to mypy checks, and when `--all-files` flag is passed to mypy, this means that 2700 files are passed. In this case pre-commit will split such long list of files to several sequential muypy executions. This is not very good because depending on the list of files passed, mypy can split the list diferently and results will be different when just list of files changes - so mypy might start detecting problems that were not present before. This PR introduces new `mypy` check that runs mypy for packages rather than individual files. We cannot run them for local pre-commit runs, because in many cases, such package based mypy check will run for minutes when a single file changes, due to cache invalidation rules - and we do not want to penalise commits that are changing common airflow code (because such PRs would invalidate a lot of mypy cache every time such common file changes). So we still want to run file-based mypy for local commits. But we do not want to pass 2700 files in CI, rather than that on CI we want to run mypy checks "per package". This PR introduces a new "manual" stage mypy pre-commit check that will run "package" based mypy checks and adds selective check rules that will decide properly when to run such tests and separate, matrix-based CI job that will run such mypy checks - separately for each of the packages: "airflow", "providers", "docs", "dev". Also this job will skip providers checks in non-main branch and will run all tests when "full tests needed" are requested. This PR ignores some errors resulted from 3rd-party libraries used that are randomply appearing when some files are modified (and fixes the current main failures) (cherry picked from commit f7b663d)
The apache#36638 change introduced "full package" checks - where in case of CI we run mypy checks separately from regular static checks, for the whole folders. However it's been a little convoluted on how the checks were run, with a separate env variable. Instead we can actually have multiple mypy-* checks (same as we have for local pre-commit runs) as mypy allows to have multiple checks with the same name in various stages. This change simplifies the setup a bit: * we name the checks "folder" checks because this is what they are * we name the check names consistent ("airflow", "providers", "docs", "dev") with mypy-folders output * we have separate small script to run the folder checks * we map "providers" into "airflow/providers" in the pre-commit
The #36638 change introduced "full package" checks - where in case of CI we run mypy checks separately from regular static checks, for the whole folders. However it's been a little convoluted on how the checks were run, with a separate env variable. Instead we can actually have multiple mypy-* checks (same as we have for local pre-commit runs) as mypy allows to have multiple checks with the same name in various stages. This change simplifies the setup a bit: * we name the checks "folder" checks because this is what they are * we name the check names consistent ("airflow", "providers", "docs", "dev") with mypy-folders output * we have separate small script to run the folder checks * we map "providers" into "airflow/providers" in the pre-commit
The #36638 change introduced "full package" checks - where in case of CI we run mypy checks separately from regular static checks, for the whole folders. However it's been a little convoluted on how the checks were run, with a separate env variable. Instead we can actually have multiple mypy-* checks (same as we have for local pre-commit runs) as mypy allows to have multiple checks with the same name in various stages. This change simplifies the setup a bit: * we name the checks "folder" checks because this is what they are * we name the check names consistent ("airflow", "providers", "docs", "dev") with mypy-folders output * we have separate small script to run the folder checks * we map "providers" into "airflow/providers" in the pre-commit (cherry picked from commit a912948)
MyPy as used in our static checks has slightly different heuristics when running on on individual files and whole packages. This sometimes causes semi-random failures when different set of files is produced when pre-commits split the files between parallel processes. The regular `mypy-*` pre-commits work by passing filenames to mypy checks, and when `--all-files` flag is passed to mypy, this means that 2700 files are passed. In this case pre-commit will split such long list of files to several sequential muypy executions. This is not very good because depending on the list of files passed, mypy can split the list diferently and results will be different when just list of files changes - so mypy might start detecting problems that were not present before. This PR introduces new `mypy` check that runs mypy for packages rather than individual files. We cannot run them for local pre-commit runs, because in many cases, such package based mypy check will run for minutes when a single file changes, due to cache invalidation rules - and we do not want to penalise commits that are changing common airflow code (because such PRs would invalidate a lot of mypy cache every time such common file changes). So we still want to run file-based mypy for local commits. But we do not want to pass 2700 files in CI, rather than that on CI we want to run mypy checks "per package". This PR introduces a new "manual" stage mypy pre-commit check that will run "package" based mypy checks and adds selective check rules that will decide properly when to run such tests and separate, matrix-based CI job that will run such mypy checks - separately for each of the packages: "airflow", "providers", "docs", "dev". Also this job will skip providers checks in non-main branch and will run all tests when "full tests needed" are requested. This PR ignores some errors resulted from 3rd-party libraries used that are randomply appearing when some files are modified (and fixes the current main failures) (cherry picked from commit f7b663d)
The #36638 change introduced "full package" checks - where in case of CI we run mypy checks separately from regular static checks, for the whole folders. However it's been a little convoluted on how the checks were run, with a separate env variable. Instead we can actually have multiple mypy-* checks (same as we have for local pre-commit runs) as mypy allows to have multiple checks with the same name in various stages. This change simplifies the setup a bit: * we name the checks "folder" checks because this is what they are * we name the check names consistent ("airflow", "providers", "docs", "dev") with mypy-folders output * we have separate small script to run the folder checks * we map "providers" into "airflow/providers" in the pre-commit (cherry picked from commit a912948)

MyPy as used in our static checks has slightly different heuristics
when running on on individual files and whole packages. This sometimes
causes semi-random failures when different set of files is produced
when pre-commits split the files between parallel processes.
The regular
mypy-*pre-commits work by passing filenames to mypychecks, and when
--all-filesflag is passed to mypy, this meansthat 2700 files are passed. In this case pre-commit will split such
long list of files to several sequential muypy executions. This
is not very good because depending on the list of files passed,
mypy can split the list diferently and results will be different
when just list of files changes - so mypy might start detecting
problems that were not present before.
This PR introduces new
mypycheck that runs mypy for packagesrather than individual files. We cannot run them for local
pre-commit runs, because in many cases, such package based
mypy check will run for minutes when a single file changes,
due to cache invalidation rules - and we do not want to penalise
commits that are changing common airflow code (because such PRs
would invalidate a lot of mypy cache every time such common file
changes). So we still want to run file-based mypy for local
commits. But we do not want to pass 2700 files in CI, rather than
that on CI we want to run mypy checks "per package".
This PR introduces a new "manual" stage mypy pre-commit check that
will run "package" based mypy checks and adds selective check rules
that will decide properly when to run such tests and separate,
matrix-based CI job that will run such mypy checks - separately
for each of the packages: "airflow", "providers", "docs", "dev".
Also this job will skip providers checks in non-main branch and
will run all tests when "full tests needed" are requested.
This PR ignores some errors resulted from 3rd-party libraries used
that are randomply appearing when some files are modified (and fixes
the current main failures)
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.