-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Sync v2-8-stable with v2-8-test to release 2.8.1 #36788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Now that Airflow 2.8.0 is released, we can remove common.io from chicken-egg providers. (cherry picked from commit 34d5001)
This was made available [as part of v0.1.8 of the Ruff Formatter](https://astral.sh/blog/ruff-v0.1.8#formatting-code-snippets-in-docstrings). Adding this config option to the `ruff-format` pre-commit hook. (cherry picked from commit e9ba37b)
Once Airflow is relesed to PyPI we should remove chicken-egg providers for that release and cherry-pick them to v2-*-test in order to prepare container images in case the image contains the providers as default extras. (cherry picked from commit 663dfd0)
…36283) When generated dependencies are not properly updated, we had a special step where the dependencies were generated "just in case" before CI image was built, because otherwise building the CI image could have failed with strange "failed because of conflicting dependencies" without a clue what was the root cause. However, the pre-commit did not return error exit code - because for the pre-commit, it is enough that a file is modified during pre-commit to fail the pre-commit in general. That had a nasty side effect because the built CI image actually already contained properly generated dependencies (by this step), and it did not properly detected cases where the ones in the repository were added manually and not generated with pre-commit. This PR fixes it - instead of generating and building such image in CI it will now fail the CI image building step but with clear instructions what to do. The CI job step uses now regular breeze command rather than running the script manually but also the script returns error code in case the generated dependencies have been updated. (cherry picked from commit 33a2fbe)
When we are installing a released version of Airflow in Breeze, we can pass additional extras to install (For example, we need to pass celery extra in order to start airflow with celery executor. The extras could be specified as: ``` breeze start-airflow --use-airflow-version 2.8.0rc4 \ --executor CeleryExecutor --airflow-extras "celery" ``` However recent refactors caused a problem that the extras added were specified after version (which is rejected by newer versions of `pip`). This PR fixes it and also moves the place where CeleryExecutor use triggers adding celery extra when`--use-airflow-version` is used. The warning about this is better visible when moving to Shell Params. (cherry picked from commit 3297806)
When we attempt to see if provider.yaml files make changes in dependencies, we print verbose information on what provider.yaml files changeed, but this is not necessary or needed. This change makes the output less verbose by detail - just a number of changed files rather than full list of them - the full list is only printed when `--verbose` flag is used. (cherry picked from commit 7212301)
2.8.0 was later released on 18th (cherry picked from commit c09a64c)
Since we are getting more diagrams generated in Airflow using the "diagram as a code" approach, this PR improves the pre-commit to be more suitable to support generation of more of the images coming from different sources, placed in different directories and generated independently, so that the whole process is more distributed and easy for whoever creates diagrams to add their own diagram. The changes implemented in this PR: * the code to generate the diagrams is now next to the diagram they generate. It has the same name as the diagram, but it has the .py extension. This way it is immediately visible where is the source of each diagram (right next to each diagram) * each of the .py diagram Python files is runnable on its own. This way you can easily regenerate the diagrams by running corresponding Python file or even automate it by running "save" action and generate the diagrams automatically by running the Python code every time the file is saved. That makes a very nice workflow on iterating on each diagram, independently from each othere * the pre-commit script is given a set of folders which should be scanned and it finds and run the diagrams on pre-commmit. It also creates and verifies the md5sum hash of the source Python file separately for each diagram and only runs diagram generation when the source file changed vs. last time the hash was saved and committed. The hash sum is stored next to the image and sources with .md5sum extension Also updated documentation in the CONTRIBUTING.rst explaining how to generate the diagrams and what is the mechanism of that generation. (cherry picked from commit b35b08e)
Currently docs building happens insid of the container image and code doing that sits in `docs` folder, while publishing has already been moved to `breeze` code (and is executed in the Breeze venv, not in the container). Both building and publishing code were present in both (copy&pasted) and the parts of it not relevant to the `other` function has not been used. While eventually we will move docs building also to `breeze` the first step of that is to remove the redundancy and clean-up unused code, so that we can make the transition cleaner. (cherry picked from commit bf90992)
…#36372) When the DB/NonDB tests were introduced (#35160) new test types have been added (separating various Python test types from generic Operator test type). However we have not added matching of the python operator and test files into the right selective unit test type. This caused that when only `operators/python.py` and `tests/test_python` were changed, then `Operators` test type was run but the specific Python * test types were not run. This PR fixes it for current test type (including also separated Serialization test type) and for the future - instead of matching selected test type we match all of them except the few that we now are "special" ("Always, Core, Other, PlainAsserts"). (cherry picked from commit b0db1f9)
When `--use-airflow-version` is a numeric or rc version, the constraints should be specific for that version when installing airflow. For example when we install 2.7.3rc1, `constraints-2.7.3rc1` should be used. This has been lost when fixing version in CI. This PR introduces these fixes: * default varlue for airflow constraints is DEFAULT_AIRFLOW_CONSTRAINTS_BRANCH * when --use-airflow-version is numeric version and default value is used for constraints (DEFAULT_AIRFLOW_CONSTRAINTS_BRANCH) then it is replaced with `constraints-VERSION` * when we print out constraints used, we print which are the constraints used by Airflow and which by providers. (cherry picked from commit 5ddd67a)
…6385) This PR updates released process for providers to enable releasing providers in more regular batches. Sometimes when we exclude a provider from previous voting, we want to release RCN (2,3 etc.) candidate. However, especially when time between previous RC and the new one is long (for example because fixing took a long time) we might want to release the RCN release for that cancelled providers and RC1 for all the providers that have been changed in the meantime. This cchange makes it possible (and easy): 1) release RC1 for all providers (the RCN provider should be skipped, because tag for this provider already exists. 2) release the RCN providers with `--version-suffix-for-pypi rcN`. The release process and tools were updated to account for that - where rc candidate number is retrieved from packages prepared in `dist`. Fixed a few small missing things in the process. (cherry picked from commit 4deed64)
(cherry picked from commit 8e70d56)
This PR adds possibility of marking the provider as "not ready" in the provider.yaml (by setting optional field as "not-ready" to `true". Setting provider as "not-ready", removes it by default from all the release management commands - preparing documentation files preparing provider packages, publishing docs. You can include such providers via `--include-not-ready-providers` flag (or setting INCLUDE_NOT_READY_PROVIDERS environment variable to true). This flag is set to True in our CI, so that we can make sure the providers in-progress are also being tested and verified, but when release manager prepares packages, those providers are not prepared. That will help in early stage of a lifecycle of a provider when we already want to iterate and test it continuously, but - for example the API of such provider is not yet stable or when we are in progress of moving functionality for such provider from core. This PR also marks `fab` providers as "not-ready" as it is still early days and we want to exclude it for now from any kind of release process. (cherry picked from commit 341d5b7)
This change will automatically generate the **right** rcN package when prepareing the packages for PyPI. This allows to have pretty much continuous release process for voting over the provider packages. Simply when an rcN candidate is not released, it will be automatically included in the next wave of packages with rcN+1 version - unless during provider package generation the version will be bumped to MAJOR or MINOR due to new changes. This allows for the workflow where in every new wave we always generate all provider packages ready for release. (cherry picked from commit 6f5a50e)
Updating the breeze docs with different pytest example as the function mentioned in the example is removed from the test_core.py (cherry picked from commit 71c726d)
(cherry picked from commit 8fea49f)
(cherry picked from commit c16b421)
The last auto-upgrade RC implementd in #36441 had a bug - it was bumping rc even for providers that have been already released. This change fixes it - it skips packages that already have "final" tag present in the repo. It also explicitely calls "Apply template update" as optional step - only needed in case we modify templates and want to update automatically generated documentation with it. (cherry picked from commit a3e5a97)
(cherry picked from commit e3fb20d)
This PR improves/simplifies the process of issue generation when provider package rc candidates are prepared for voting. It improves the commmand to generate the issue and makes it simpler (less copy&paste) to create such issue, the issue also does not use the "Meta" template and gets the right labels assigned automatically. Recent changes that automatically derive the suffix from PyPI packages prepared, removed the need of passing `--suffix` as parameter. In all cases the right rc* suffix will be automatically added during issue generation based on the version of package being prepared. The process has been updated and command simplified by removing the `--suffix` flag. When the issue is prepared, we display the issue in terminal and asked the release manager to create the issue by copy&pasting the issue content and title to a new issue, but that required a few copy&pastes and opening new Issue via "Meta" task type. This PR simplifies it a bit by not only displaying the content but also generating a URL that can be either copy&pasted to browser URL field or just Cmd+clicked if your terminal allows that. Issue created this way does not have the "Body" field header and has the labels properly assigned including a dedicated "testing status" label that is used to gether stats for past "status" issues. The advice for release manager has been improved (the comment generated had some missing end of sentence and it should be now clearer on how to iterate during issue generation if you want to remove some PRs from the generated issue content. (cherry picked from commit 5d88f6f)
* Check executable permission for entrypoints at breeze start Sometimes our contributors check out Airflow repository on filesystems that are not POSIX compliant and do not have support for executable bits (for example when you check-out the repository in Windows and attempt to map it to a Linux VM). Breeze and building CI images will not work in this case, but the error that you see might be misleading. This PR performs additional environment check and informs you that you should not do it, if executable bits are missing from entrypoints. * Update dev/breeze/src/airflow_breeze/utils/docker_command_utils.py Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com> --------- Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com> (cherry picked from commit 5551e14)
Seems that when mysql repository is used to install mysql client, it induces libxml compilation for Python 3.8 and 3.9 and this library requires devel version of zlib that is missing in the image. This PR adds the devel version as dev apt dependency. (cherry picked from commit 2bc34ff)
(cherry picked from commit 13e4905)
Some recent changes, adding removed and suspended state for breeze caused significant slow-down of autocompletion retrieval - as it turned out, because we loaded and parsed all provider yaml files during auto-completion - in order to determine list of providers available for some commands. We already planned to replace the several states (suspended, not-ready, removed) with a single state field - by doing it and addding the field to pre-commit generated "provider_dependencies.json" we could switch to parsing the single provider_dependencies.json file and retrieve provider list from there following the state stored in that json file. This also simplifies state management following the recently added state diagram by following the same state lifecycle: "not-ready" -> "ready" -> "suspended" -> "removed"
This is the last one from the long-backtracking series. Telegram 20.2 has been released in March 2023 and for all practical purposes using recent version is a good idea to interact with such services. Bumping it cuts down on a number of backtracking loops pip has to do when backtracking. (cherry picked from commit 1f6d764)
This command install airflow in k8s venv and in case version of Python is not yet supported by Airflow, it might fail. We do not have check it lower-bound because breeze supports the same minimum version of Airflow as Airflow itself. The command prints instructions on how to reinstall breeze with different Python version in such case. (cherry picked from commit 9264a4b)
The #35650 introduced a hotfix for Pyarrow CVE-2023-47248. So far we have been blocked from removing it by Apache Beam that limited Airflow from bumping pyarrow to a version that was not vulnerable. This is now possible since Apache Beam relesed 2.53.0 version on 4th of January 2023 that allows to use non-vulnerable pyarrow. We are now bumping both Pyarrow and Beam minimum versions to reflect that and remove pyarrow hotfix. (cherry picked from commit d105c71)
(cherry picked from commit c59f8de)
…nd (#36537) This PR changes Airflow installation and build backend to use new standard Python ways of building Python applications. We've been trying to do it for quite a while. Airflow tranditionally has been using complex and convoluted build process based on setuptools and (extremely) custom setup.py file. It survived migration to Airflow 2.0 and splitting Airlfow monorepo into Airflow and Providers, adding pre-installed providers and switching providers to use flit (and follow build standards). So far tooling in Python ecosystme had not been able to fuflill our needs and we refrained to develop our own tooling, but finally with appearance of Hatch (managed by Python Packaging Authority) and few recent advancements there we are finally able to swtich to Python standard ways of managing project dependnecy configuration and project build setup (with a few customizations). This PR makes airflow build process follow those standard PEPs: * Airflow has all build configuration stored in pyproject.toml following PEP 518 which allows any fronted (`pip`, `poetry`, `hatch`, `flit`, or whatever other frontend is used to install required build dependendencies to install Airflow locally and to build distribution pacakges (sdist/wheel) * Hatchling backend follows PEP 517 for standard source tree and build backend implementation that allows to execute the build in a frontend-independent way * We store all project metadata in pyprooject.toml - following PEP 621 where all necessary project metadata components were defined. * We plug-in into Hatchling "editable build" hooks following PEP 660. Hatchling internally builds editable wheel that is used as ephemeral step and communication between backend and frontend (and this ephemeral wheel is used to make editable installation of the projeect - suitable for fast iteration of code without reinstalling the package. With Airflow having many provider packages in single source tree where we want to be able to install and develop airflow and providers together, this is not a small feat to implement the case wher editable installation has to behave quite a bit differently when it comes to packaging and dependencies for editable install (when you want to edit sources directly) and installable package (where you want to have separate Airflow package and provider packages). Fortunately the standardisation efforts in the Python Packaging community and tooling implementing it had finally made it possible. Some of the important ways bow this has been achieved: * We continue using provider.yaml in providers as the single source of trutgh for per-provider dependencies. We added a possibility to specify "devel-dependencies" in provider.yaml so that all per-provider dependencies in `generated/provider_dependencies.json` and `pyproject.toml` are generated from those dependencies via update-providers-dependencies pre-commit. * Pyproject.toml is generally managed manually, but the part where provider dependencies and bundle dependencies are used is automatically updated by a pre-commit whenever provider dependencies change. Those generated provider dependencies contain just dependencies of providers - not the provider packages, but in the final "standard" wheel file they are replaced with "apache-airflow-providers-PROVIDER" dependencies - so that the wheel package will only install the provider and use the dependencies of that version of provider it installs. * We are utilising custom hatchiling build hooks (PEP 660 standard) that allow to modify 'standard' wheel package on-the-fly when the wheel is being prepared by adding preinstalled package dependencies (which are not needed in editable build) and by removing all devel extras (that are not needed in the PyPI distributed wheel package). This allows to solve the conundrum of having different "editable" and "standard" behaviour while keeping the same project specification in pyproject.toml. * We added description of how `Hatch` can be employed as build frontend in order to manage local virtualenv and install Airflow in editable way easily - while keeping all properties of the installed application (including working airflow cli and package metadata discovery) as well as how to use PEP-standard ways of bulding wheel and sdist packages. * We have a custom step (following PEP-standards) to inject airflow-specific build steps - compiling www assets and generating git commit hash version to display it in the UI * We also show how all this makes it possible to make it easy to manage local virtualenvs and editable installations for Airflow contributors - without vendor lock-in of the build tools as by following standard PEPs Airflow can be locally and editably installed by anyone using any build front-end tools following the standards - whether you use `pip`, `poetry`, `Hatch`, `flit` or any other frontent build tools, Airflow local installation and package building will work the same way for all of them, where both "editable" and "standard" package prepration is managed by `hatchling` backend in the same way. * Previously our extras contained a "." which is not normalized name for extras - `pip` and other tools replaced it automatically with `_'. This change updates the extra names to contain '-' rather than '.' in the name, following PEP-685. This should be fully backwards compatible, users will still be able to use "." but it will be normalized to "-" in Airflow packages. This is also future proof as it is expected that all package managers and tools will eventually use PEP-685 applied to extras, even if currently some of the tools (pip + setuptools) might generate warnings. * Additionally, this change organizes the documentation around the extras and dependencies, explaining the reasoning behind all the different extras we have. * As a bonus (and this is what we used to test it all) we are documenting how to use Hatch frontend to: * manage multiple Python installations * manage multiple Pythob virtualenv environments * build Airflow packages for release management (cherry picked from commit c439ab8)
…36726) Hatch has built-in support for reproducible builds, however it uses a hard-coded 2020 date to generate the reproducible binaries, which produces whl, tar.gz files that contain file dates that are pretty old. This might be confusing for anyone who is looking at the file contents and timestamp inside. This PR adds support (similar to provider approach) to store current reproducible date in the repository - so that it can be committed and tagged together with Airflow sources. It is updated fully automaticallly by pre-commit whenever release notes change, which basically means that whenever release notes are update just before release, the reproducible date is updated to current date. For now we only check if the packages produced by hatchling build are reproducible. (cherry picked from commit a2d6c38)
* Add support of Pendulum 3 * Add backcompat to pendulum 2 * Update airflow/serialization/serialized_objects.py Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> * Add newsfragments --------- Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> (cherry picked from commit 2ffa6e4)
Source tarball is the main artifact produced by the release process - one that is the "official" release and named like that by the Apache Software Foundation. This PR makes the source tarball generation reproducible - following reproducibility of the `.whl` and `sdist` packages. This change adds: * vendors-in reproducible.py script that repacks .tar.gz package in reproducible way using source-date-epoch as timestamps * breeze release-management prepare-airflow-tarball command * adds verification of the tarballs to PMC verification process * adds --use-local-hatch for package building command to allow for faster / non-docker build of packages for PMC verification * improves diagnostic output of the release and build commands (cherry picked from commit 72a571d)
Index is only helpful for a user's custom query -- not for airflow in general (see comment #30762 (comment)). Noticed that this query had zero scans over a period of months. I also observed that it also takes up as much space as the table itself. Since it's not generally useful, it doesn't belong in airflow OSS. Reverts #30762 (cherry picked from commit e20b400)
) (cherry picked from commit 70cefeb)
When we build cach from the scratch, cache preparation can take longer than 50 minutes (right now it's about an hour). Timeout increase to 120 minutes should solve the problem that in such case the PROD cache building gets cancelled and you neeed to re-run it to succeed. (cherry picked from commit c7ade01)
The #36638 change introduced "full package" checks - where in case of CI we run mypy checks separately from regular static checks, for the whole folders. However it's been a little convoluted on how the checks were run, with a separate env variable. Instead we can actually have multiple mypy-* checks (same as we have for local pre-commit runs) as mypy allows to have multiple checks with the same name in various stages. This change simplifies the setup a bit: * we name the checks "folder" checks because this is what they are * we name the check names consistent ("airflow", "providers", "docs", "dev") with mypy-folders output * we have separate small script to run the folder checks * we map "providers" into "airflow/providers" in the pre-commit (cherry picked from commit a912948)
* metrics tagging documentation (cherry picked from commit 667b842)
* Add log lookup exception for empty op subtypes * Use exception catching approach instead to preserve tests (cherry picked from commit ddcaef4)
Client source code and package generation was done using the code generated and committed to `airflow-client-python` and while the repository with such code is useful to have, it's just a convenience repo, because all sources are (and should be) generated from the API specification which is present in the Airflow repository. This also made the reproducible builds and package generation not really possible, because we never knew if the source generated in the `airflow-client-python` repository has been generated and not tampered with. While implementing it, it turned out that there were some issues in the past that nade our client generation somewhat broken.. * In 2.7.0 python client, we added the same code twice (See apache/airflow-client-python#93) on top of "airflow_client.client" package, we also added copy of the API client generated in "airflow_client.airflow_client" - that was likely due to bad bash scripts and tools that were used to generate it and errors during generation the clients. * We used to generate the code for "client" package and then moved the "client" package to "airflow_client.client" package, while manually modifying imports with `sed` (!?). That was likely due to limitations in some old version of the client generator. However the client generator we use now is capable of generating code directly in the "airflow_client.client" package. * We also manually (via pre-commit) added Apache Licence to the generated files. Whieh was completely unnecessary, because ASF rules do not require licence headers to be added to code automatically generated from a code that already has ASF licence. * We also generated source tarball packages from such generated code, which was completely unnecessary - because sdist packages are already fulfilling all the reqirements of such source pacakges - the code in the packages is enough to build the package from the sources and it does not contain any binary code, moreover the code is generated out of the API specificiation, which means that anyone can take the code and genearate the pacakged software from just sources in sdist. Similarly as in case of provider packages, we do not need to produce separate -source.tar.gz files. This PR fixes all of it. First of all the source that lands in the source repository `airflow-client-python` and sdist/wheel packages are generated directly from the openapi specification. They are generated using breeze release_management command from airflow source tagged with specific tag in the Airflow repo (including the source of reproducible build date that is updated together with airflow release notes. This means that any PMC member can regenerate packages (binary identical) straight from the Airflow repository - without going through "airflow-client-python" repository. No source tarball is generated - it is not needed, sdist is enough. The `test_python_client.py` has been also moved over to Airflow repo and updated with handling case when expose_config is not enabled and it is used to automatically test the API client after it has been generated. (cherry picked from commit 9787440)
When sqlite URL uses relative path, the error printed is quite cryptic: ``` sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) unable to open database file ``` This might easily happen for example when you are in a hurry and put relative value in your AIRFLOW_HOME. This PR checks if sql is relative and throws more appropriate and explicit message what is wrong. (cherry picked from commit 082055e)
) In #36003 we **thought** we changed default "version" image to point to "newest" python version not to the "oldest" supported one - as agreed in https://lists.apache.org/thread/0oxnvct24xlqsj76z42w2ttw2d043oy3 However as observed and tracked in #36740 the change was not effective. We only changed the moment at which latest image is pointing to 2.8.0 but not whether 2.8.0 points to `python-3.8` or `python-3.11'. This means that we should only do that change for Python 3.9 qnd revert the changelog (and cherry-pick it to 2.8.1) (cherry picked from commit 270b112)
Signed-off-by: BobDu <i@bobdu.cc> (cherry picked from commit a87953e)
…36792) In some circumstances, when breeze is installed in CI (when we update to newer breeze version in "build-info" workflow in old branches) breeze is not able to auto-detect sources it was installed from. This PR changes it by passing the sources via environment variable. (cherry picked from commit e8080b8)
e988fb5 to
1c23fb7
Compare
* Fix airflow-scheduler exiting with code 0 on exceptions * Fix static check (cherry picked from commit 1d5d502)
1c23fb7 to
c0ffa9c
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Time for
2.8.1rc1!