Skip to content

Conversation

@ephraimbuddy
Copy link
Contributor

Time for 2.8.1rc1!

potiuk and others added 30 commits December 18, 2023 19:29
Now that Airflow 2.8.0 is released, we can remove common.io from
chicken-egg providers.

(cherry picked from commit 34d5001)
This was made available [as part of v0.1.8 of the Ruff Formatter](https://astral.sh/blog/ruff-v0.1.8#formatting-code-snippets-in-docstrings). Adding this config option to the `ruff-format` pre-commit hook.

(cherry picked from commit e9ba37b)
Once Airflow is relesed to PyPI we should remove chicken-egg
providers for that release and cherry-pick them to v2-*-test in
order to prepare container images in case the image contains the
providers as default extras.

(cherry picked from commit 663dfd0)
…36283)

When generated dependencies are not properly updated, we had a special
step where the dependencies were generated "just in case" before CI
image was built, because otherwise building the CI image could have
failed with strange "failed because of conflicting dependencies"
without a clue what was the root cause.

However, the pre-commit did not return error exit code - because for the
pre-commit, it is enough that a file is modified during pre-commit to
fail the pre-commit in general.

That had a nasty side effect because the built CI image actually already
contained properly generated dependencies (by this step), and it did not
properly detected cases where the ones in the repository were added
manually and not generated with pre-commit.

This PR fixes it - instead of generating and building such image in
CI it will now fail the CI image building step but with clear
instructions what to do.

The CI job step uses now regular breeze command rather than running
the script manually but also the script returns error code in case
the generated dependencies have been updated.

(cherry picked from commit 33a2fbe)
When we are installing a released version of Airflow in Breeze, we can
pass additional extras to install (For example, we need to pass celery
extra in order to start airflow with celery executor.

The extras could be specified as:

```
breeze start-airflow --use-airflow-version 2.8.0rc4  \
  --executor CeleryExecutor --airflow-extras "celery"

```

However recent refactors caused a problem that the extras added were
specified after version (which is rejected by newer versions of `pip`).

This PR fixes it and also moves the place where CeleryExecutor use
triggers adding celery extra when`--use-airflow-version` is used.

The warning about this is better visible when moving to Shell Params.

(cherry picked from commit 3297806)
…36288)

When Chicken-egg providers are released, we also have to do some
manual adjustments of constraints and version of airflow in
the v2-8-test branch.

(cherry picked from commit 35117aa)
When we attempt to see if provider.yaml files make changes in
dependencies, we print verbose information on what provider.yaml
files changeed, but this is not necessary or needed. This change
makes the output less verbose by detail - just a number of changed
files rather than full list of them - the full list is only printed
when `--verbose` flag is used.

(cherry picked from commit 7212301)
* Update RELEASE_NOTES.rst

(cherry picked from commit 26990e2)

* Update RELEASE_NOTES.rst

(cherry picked from commit d0c1c45)

* Update RELEASE_NOTES.rst

(cherry picked from commit db2b75c)

* Airflow 2.8.0 has been released

* fixup! Airflow 2.8.0 has been released

(cherry picked from commit 51d3114)
2.8.0 was later released on 18th

(cherry picked from commit c09a64c)
Since we are getting more diagrams generated in Airflow using the
"diagram as a code" approach, this PR improves the pre-commit to be
more suitable to support generation of more of the images coming
from different sources, placed in different directories and generated
independently, so that the whole process is more distributed and easy
for whoever creates diagrams to add their own diagram.

The changes implemented in this PR:

* the code to generate the diagrams is now next to the diagram they
  generate. It has the same name as the diagram, but it has the .py
  extension. This way it is immediately visible where is the source
  of each diagram (right next to each diagram)

* each of the .py diagram Python files is runnable on its own. This
  way you can easily regenerate the diagrams by running corresponding
  Python file or even automate it by running "save" action and generate
  the diagrams automatically by running the Python code every time
  the file is saved. That makes a very nice workflow on iterating on
  each diagram, independently from each othere

* the pre-commit script is given a set of folders which should be
  scanned and it finds and run the diagrams on pre-commmit. It also
  creates and verifies the md5sum hash of the source Python file
  separately for each diagram and only runs diagram generation when
  the source file changed vs. last time the hash was saved and
  committed. The hash sum is stored next to the image and sources
  with .md5sum extension

Also updated documentation in the CONTRIBUTING.rst explaining how
to generate the diagrams and what is the mechanism of that
generation.

(cherry picked from commit b35b08e)
Currently docs building happens insid of the container image and code
doing that sits in `docs` folder, while publishing has already been
moved to `breeze` code (and is executed in the Breeze venv, not in the
container). Both building and publishing code were present in both
(copy&pasted) and the parts of it not relevant to the `other` function
has not been used.

While eventually we will move docs building also to `breeze` the first
step of that is to remove the redundancy and clean-up unused code, so
that we can make the transition cleaner.

(cherry picked from commit bf90992)
…#36372)

When the DB/NonDB tests were introduced (#35160) new test types have
been added (separating various Python test types from generic
Operator test type). However we have not added matching of the python
operator and test files into the right selective unit test type. This
caused that when only `operators/python.py` and `tests/test_python` were
changed, then `Operators` test type was run but the specific Python *
test types were not run.

This PR fixes it for current test type (including also separated
Serialization test type) and for the future - instead of matching
selected test type we match all of them except the few that we
now are "special" ("Always, Core, Other, PlainAsserts").

(cherry picked from commit b0db1f9)
When `--use-airflow-version` is a numeric or rc version, the constraints
should be specific for that version when installing airflow. For example
when we install 2.7.3rc1, `constraints-2.7.3rc1` should be used.

This has been lost when fixing version in CI.

This PR introduces these fixes:

* default varlue for airflow constraints is DEFAULT_AIRFLOW_CONSTRAINTS_BRANCH

* when --use-airflow-version is numeric version and default value is
  used for constraints (DEFAULT_AIRFLOW_CONSTRAINTS_BRANCH) then it
  is replaced with `constraints-VERSION`

* when we print out constraints used, we print which are the
  constraints used by Airflow and which by providers.

(cherry picked from commit 5ddd67a)
…6385)

This PR updates released process for providers to enable releasing
providers in more regular batches. Sometimes when we exclude a
provider from previous voting, we want to release RCN (2,3 etc.)
candidate.

However, especially when time between previous RC and the new one
is long (for example because fixing took a long time) we might
want to release the RCN release for that cancelled providers and
RC1 for all the providers that have been changed in the meantime.

This cchange makes it possible (and easy):

1) release RC1 for all providers (the RCN provider should be skipped,
   because tag for this provider already exists.

2) release the RCN providers with `--version-suffix-for-pypi rcN`.

The release process and tools were updated to account for that - where
rc candidate number is retrieved from packages prepared in `dist`.

Fixed a few small missing things in the process.

(cherry picked from commit 4deed64)
This PR adds possibility of marking the provider as "not ready" in the
provider.yaml (by setting optional field as "not-ready" to `true".

Setting provider as "not-ready", removes it by default from all the
release management commands - preparing documentation files preparing
provider packages, publishing docs.

You can include such providers via `--include-not-ready-providers`
flag (or setting INCLUDE_NOT_READY_PROVIDERS environment variable to
true).

This flag is set to True in our CI, so that we can make sure the
providers in-progress are also being tested and verified, but when
release manager prepares packages, those providers are not prepared.

That will help in early stage of a lifecycle of a provider when we
already want to iterate and test it continuously, but - for example
the API of such provider is not yet stable or when we are in progress
of moving functionality for such provider from core.

This PR also marks `fab` providers as "not-ready" as it is still
early days and we want to exclude it for now from any kind of release
process.

(cherry picked from commit 341d5b7)
This change will automatically generate the **right** rcN package
when prepareing the packages for PyPI. This allows to have pretty
much continuous release process for voting over the provider packages.

Simply when an rcN candidate is not released, it will be automatically
included in the next wave of packages with rcN+1 version - unless during
provider package generation the version will be bumped to MAJOR or MINOR
due to new changes.

This allows for the workflow where in every new wave we always generate
all provider packages ready for release.

(cherry picked from commit 6f5a50e)
Updating the breeze docs with different pytest example as the function mentioned in the example is removed from the test_core.py

(cherry picked from commit 71c726d)
The last auto-upgrade RC implementd in #36441 had a bug - it was bumping
rc even for providers that have been already released. This change fixes
it - it skips packages that already have "final" tag present in the
repo. It also explicitely calls "Apply template update" as optional
step - only needed in case we modify templates and want to update
automatically generated documentation with it.

(cherry picked from commit a3e5a97)
This PR improves/simplifies the process of issue generation when
provider package rc candidates are prepared for voting.

It improves the commmand to generate the issue and makes it simpler
(less copy&paste) to create such issue, the issue also does not use
the "Meta" template and gets the right labels assigned automatically.

Recent changes that automatically derive the suffix from PyPI packages
prepared, removed the need of passing `--suffix` as parameter. In all
cases the right rc* suffix will be automatically added during issue
generation based on the version of package being prepared. The process
has been updated and command simplified by removing the `--suffix` flag.

When the issue is prepared, we display the issue in terminal and asked
the release manager to create the issue by copy&pasting the issue
content and title to a new issue, but that required a few copy&pastes
and opening new Issue via "Meta" task type. This PR simplifies it a
bit by not only displaying the content but also generating a URL that
can be either copy&pasted to browser URL field or just Cmd+clicked
if your terminal allows that. Issue created this way does not have
the "Body" field header and has the labels properly assigned including
a dedicated "testing status" label that is used to gether stats for
past "status" issues.

The advice for release manager has been improved (the comment generated
had some missing end of sentence and it should be now clearer on how
to iterate during issue generation if you want to remove some PRs from
the generated issue content.

(cherry picked from commit 5d88f6f)
* Check executable permission for entrypoints at breeze start

Sometimes our contributors check out Airflow repository on filesystems
that are not POSIX compliant and do not have support for executable bits
(for example when you check-out the repository in Windows and attempt to
map it to a Linux VM). Breeze and building CI images will not
work in this case, but the error that you see might be misleading.

This PR performs additional environment check and informs you that
you should not do it, if executable bits are missing from entrypoints.

* Update dev/breeze/src/airflow_breeze/utils/docker_command_utils.py

Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>

---------

Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>
(cherry picked from commit 5551e14)
Seems that when mysql repository is used to install mysql client,
it induces libxml compilation for Python 3.8 and 3.9 and this
library requires devel version of zlib that is missing in the image.

This PR adds the devel version as dev apt dependency.

(cherry picked from commit 2bc34ff)
(cherry picked from commit 13e4905)
Some recent changes, adding removed and suspended state for breeze
caused significant slow-down of autocompletion retrieval - as it
turned out, because we loaded and parsed all provider yaml files
during auto-completion - in order to determine list of providers
available for some commands.

We already planned to replace the several states (suspended,
not-ready, removed) with a single state field - by doing it and
addding the field to pre-commit generated "provider_dependencies.json"
we could switch to parsing the single provider_dependencies.json
file and retrieve provider list from there following the state stored
in that json file.

This also simplifies state management following the recently
added state diagram by following the same state lifecycle:

"not-ready" -> "ready" -> "suspended" -> "removed"
potiuk and others added 24 commits January 15, 2024 20:47
This is the last one from the long-backtracking series.

Telegram 20.2 has been released in March 2023 and for all practical
purposes using recent version is a good idea to interact with such
services. Bumping it cuts down on a number of backtracking loops
pip has to do when backtracking.

(cherry picked from commit 1f6d764)
This command install airflow in k8s venv and in case version of
Python is not yet supported by Airflow, it might fail.

We do not have check it lower-bound because breeze supports the
same minimum version of Airflow as Airflow itself.

The command prints instructions on how to reinstall breeze with
different Python version in such case.

(cherry picked from commit 9264a4b)
The #35650 introduced a hotfix for Pyarrow CVE-2023-47248. So far
we have been blocked from removing it by Apache Beam that limited
Airflow from bumping pyarrow to a version that was not vulnerable.

This is now possible since Apache Beam relesed 2.53.0 version on
4th of January 2023 that allows to use non-vulnerable pyarrow.

We are now bumping both Pyarrow and Beam minimum versions to
reflect that and remove pyarrow hotfix.

(cherry picked from commit d105c71)
…rsion` in Hashicorp operator (#36532)

* explicitly passing raise_on_deleted_version=True to read_secret_version

* fix tests

* update hvac version

(cherry picked from commit cd5ab08)
…nd (#36537)

This PR changes Airflow installation and build backend to use new
standard Python ways of building Python applications.

We've been trying to do it for quite a while. Airflow tranditionally
has been using complex and convoluted build process based on
setuptools and (extremely) custom setup.py file. It survived
migration to Airflow 2.0 and splitting Airlfow monorepo into
Airflow and Providers, adding pre-installed providers and switching
providers to use flit (and follow build standards).

So far tooling in Python ecosystme had not been able to fuflill our
needs and we refrained to develop our own tooling, but finally with
appearance of Hatch (managed by Python Packaging Authority) and
few recent advancements there we are finally able to swtich to
Python standard ways of managing project dependnecy configuration
and project build setup (with a few customizations).

This PR makes airflow build process follow those standard PEPs:

* Airflow has all build configuration stored in pyproject.toml
  following PEP 518 which allows any fronted (`pip`, `poetry`,
  `hatch`, `flit`, or whatever other frontend is used to
  install required build dependendencies to install Airflow
  locally and to build distribution pacakges (sdist/wheel)

* Hatchling backend follows PEP 517 for standard source tree and build
  backend implementation that allows to execute the build in a
  frontend-independent way

* We store all project metadata in pyprooject.toml - following
  PEP 621 where all necessary project metadata components were
  defined.

* We plug-in into Hatchling "editable build" hooks following
  PEP 660. Hatchling internally builds editable wheel that
  is used as ephemeral step and communication between backend
  and frontend (and this ephemeral wheel is used to make
  editable installation of the projeect - suitable for fast
  iteration of code without reinstalling the package.

With Airflow having many provider packages in single source tree
where we want to be able to install and develop airflow and
providers together, this is not a small feat to implement the
case wher editable installation has to behave quite a bit
differently when it comes to packaging and dependencies for
editable install (when you want to edit sources directly) and
installable package (where you want to have separate Airflow
package and provider packages). Fortunately the standardisation
efforts in the Python Packaging community and tooling implementing
it had finally made it possible.

Some of the important ways bow this has been achieved:

* We continue using provider.yaml in providers as the single source
  of trutgh for per-provider dependencies. We added a possibility
  to specify "devel-dependencies" in provider.yaml so that all
  per-provider dependencies in `generated/provider_dependencies.json`
  and `pyproject.toml` are generated from those dependencies via
  update-providers-dependencies pre-commit.

* Pyproject.toml is generally managed manually, but the part where
  provider dependencies and bundle dependencies are used is
  automatically updated by a pre-commit whenever provider
  dependencies change. Those generated provider dependencies contain
  just dependencies of providers - not the provider packages, but
  in the final "standard" wheel file they are replaced with
  "apache-airflow-providers-PROVIDER" dependencies - so that the
  wheel package will only install the provider and use the
  dependencies of that version of provider it installs.

* We are utilising custom hatchiling build hooks (PEP 660 standard)
  that allow to modify 'standard' wheel package on-the-fly when
  the wheel is being prepared by adding preinstalled package
  dependencies (which are not needed in editable build) and by
  removing all devel extras (that are not needed in the PyPI
  distributed wheel package). This allows to solve the conundrum
  of having different "editable" and "standard" behaviour while
  keeping the same project specification in pyproject.toml.

* We added description of how `Hatch` can be employed as build
  frontend in order to manage local virtualenv and install Airflow
  in editable way easily - while keeping all properties of the
  installed application (including working airflow cli and
  package metadata discovery) as well as how to use PEP-standard
  ways of bulding wheel and sdist packages.

* We have a custom step (following PEP-standards) to inject
  airflow-specific build steps - compiling www assets and
  generating git commit hash version to display it in the UI

* We also show how all this makes it possible to make it easy to
  manage local virtualenvs and editable installations for Airflow
  contributors - without vendor lock-in of the build tools as
  by following standard PEPs Airflow can be locally and editably
  installed by anyone using any build front-end tools following
  the standards - whether you use `pip`, `poetry`, `Hatch`, `flit`
  or any other frontent build tools, Airflow local installation
  and package building will work the same way for all of them,
  where both "editable" and "standard" package prepration is
  managed by `hatchling` backend in the same way.

* Previously our extras contained a "." which is not normalized
  name for extras - `pip` and other tools replaced it automatically
  with `_'. This change updates the extra names to contain
  '-' rather than '.' in the name, following PEP-685.  This should be
  fully backwards compatible, users will still be able to use "." but it
  will be normalized to "-" in Airflow packages. This is also future
  proof as it is expected that all package managers and tools
  will eventually use PEP-685 applied to extras, even if currently
  some of the tools (pip + setuptools) might generate warnings.

* Additionally, this change organizes the documentation around
  the extras and dependencies, explaining the reasoning behind
  all the different extras we have.

* As a bonus (and this is what we used to test it all) we are
  documenting how to use Hatch frontend to:

  * manage multiple Python installations
  * manage multiple Pythob virtualenv environments
  * build Airflow packages for release management

(cherry picked from commit c439ab8)
…36726)

Hatch has built-in support for reproducible builds, however it
uses a hard-coded 2020 date to generate the reproducible binaries,
which produces whl, tar.gz files that contain file dates that are
pretty old. This might be confusing for anyone who is looking at
the file contents and timestamp inside.

This PR adds support (similar to provider approach) to store current
reproducible date in the repository - so that it can be committed
and tagged together with Airflow sources. It is updated fully
automaticallly by pre-commit whenever release notes change, which
basically means that whenever release notes are update just
before release, the reproducible date is updated to current date.

For now we only check if the packages produced by hatchling
build are reproducible.

(cherry picked from commit a2d6c38)
* Add support of Pendulum 3

* Add backcompat to pendulum 2

* Update airflow/serialization/serialized_objects.py

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

* Add newsfragments

---------

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
(cherry picked from commit 2ffa6e4)
Source tarball is the main artifact produced by the release
process - one that is the "official" release and named like that
by the Apache Software Foundation.

This PR makes the source tarball generation reproducible - following
reproducibility of the `.whl` and `sdist` packages.

This change adds:

* vendors-in reproducible.py script that repacks .tar.gz package
  in reproducible way using source-date-epoch as timestamps
* breeze release-management prepare-airflow-tarball command
* adds verification of the tarballs to PMC verification process
* adds --use-local-hatch for package building command to allow for
  faster / non-docker build of packages for PMC verification
* improves diagnostic output of the release and build commands

(cherry picked from commit 72a571d)
Index is only helpful for a user's custom query -- not for airflow in general (see comment #30762 (comment)).  Noticed that this query had zero scans over a period of months.  I also observed that it also takes up as much space as the table itself.  Since it's not generally useful, it doesn't belong in airflow OSS.

Reverts #30762

(cherry picked from commit e20b400)
When we build cach from the scratch, cache preparation can take
longer than 50 minutes (right now it's about an hour). Timeout
increase to 120 minutes should solve the problem that in such case
the PROD cache building gets cancelled and you neeed to re-run
it to succeed.

(cherry picked from commit c7ade01)
The #36638 change introduced "full package" checks - where in
case of CI we run mypy checks separately from regular static checks,
for the whole folders.

However it's been a little convoluted on how the checks were run,
with a separate env variable. Instead we can actually have multiple
mypy-* checks (same as we have for local pre-commit runs) as mypy
allows to have multiple checks with the same name in various stages.

This change simplifies the setup a bit:

* we name the checks "folder" checks because this is what they are
* we name the check names consistent ("airflow", "providers", "docs",
  "dev") with mypy-folders output
* we have separate small script to run the folder checks
* we map "providers" into "airflow/providers" in the pre-commit

(cherry picked from commit a912948)
* metrics tagging documentation

(cherry picked from commit 667b842)
* Add log lookup exception for empty op subtypes

* Use exception catching approach instead to preserve tests

(cherry picked from commit ddcaef4)
…sed as strings instead of integers (#36756)

(cherry picked from commit e2335a0)
Client source code and package generation was done using the code
generated and committed to `airflow-client-python` and while the
repository with such code is useful to have, it's just a convenience
repo, because all sources are (and should be) generated from the
API specification which is present in the Airflow repository.

This also made the reproducible builds and package generation not really
possible, because we never knew if the source generated in the
`airflow-client-python` repository has been generated and not tampered
with.

While implementing it, it turned out that there were some issues in
the past that nade our client generation somewhat broken..

* In 2.7.0 python client, we added the same code twice
  (See apache/airflow-client-python#93) on
  top of "airflow_client.client" package, we also added copy of the
  API client generated in "airflow_client.airflow_client" - that was
  likely due to bad bash scripts and tools that were used to generate
  it and errors during generation the clients.

* We used to generate the code for "client" package and then moved
  the "client" package to "airflow_client.client" package, while
  manually modifying imports with `sed` (!?). That was likely due to
  limitations in some old version of the client generator. However the
  client generator we use now is capable of generating code directly in
  the "airflow_client.client" package.

* We also manually (via pre-commit) added Apache Licence to the
  generated files. Whieh was completely unnecessary, because ASF rules
  do not require licence headers to be added to code automatically
  generated from a code that already has ASF licence.

* We also generated source tarball packages from such generated code,
  which was completely unnecessary - because sdist packages are already
  fulfilling all the reqirements of such source pacakges - the code
  in the packages is enough to build the package from the sources and
  it does not contain any binary code, moreover the code is generated
  out of the API specificiation, which means that anyone can take
  the code and genearate the pacakged software from just sources in
  sdist. Similarly as in case of provider packages, we do not need
  to produce separate -source.tar.gz files.

This PR fixes all of it.

First of all the source that lands in the source repository
`airflow-client-python` and sdist/wheel packages are generated directly
from the openapi specification.

They are generated using breeze release_management command from airflow
source  tagged with specific tag in the Airflow repo (including the
source of reproducible build date that is updated together with airflow
release notes. This means that any PMC member can regenerate packages
(binary identical) straight from the Airflow repository - without
going through "airflow-client-python" repository.

No source tarball is generated - it is not needed, sdist is enough.

The `test_python_client.py` has been also moved over to Airflow repo
and updated with handling case when expose_config is not enabled and
it is used to automatically test the API client after it has been
generated.

(cherry picked from commit 9787440)
When sqlite URL uses relative path, the error printed is quite
cryptic:

```
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) unable to open database file
```

This might easily happen for example when you are in a hurry and put
relative value in your AIRFLOW_HOME.

This PR checks if sql is relative and throws more appropriate and
explicit message what is wrong.

(cherry picked from commit 082055e)
)

In #36003 we **thought** we changed default "version" image to
point to "newest" python version not to the "oldest" supported
one - as agreed in https://lists.apache.org/thread/0oxnvct24xlqsj76z42w2ttw2d043oy3

However as observed and tracked in #36740 the change was not effective.
We only changed the moment at which latest image is pointing to
2.8.0 but not whether 2.8.0 points to `python-3.8` or `python-3.11'.

This means that we should only do that change for Python 3.9 qnd
revert the changelog (and cherry-pick it to 2.8.1)

(cherry picked from commit 270b112)
Signed-off-by: BobDu <i@bobdu.cc>
(cherry picked from commit a87953e)
…k instance list (#36693)

* Fix Callback exception when a removed task is the last one in the task instance list

* Add test_dag_handle_callback_with_removed_task

* Remove extra break line

* Merge TIs filters

* Fix static check

* Revert changes

(cherry picked from commit 8c1c09b)
…36792)

In some circumstances, when breeze is installed in CI (when we
update to newer breeze version in "build-info" workflow in old
branches) breeze is not able to auto-detect sources it was installed
from.

This PR changes it by passing the sources via environment variable.

(cherry picked from commit e8080b8)
joaopamaral and others added 3 commits January 16, 2024 07:38
* Fix airflow-scheduler exiting with code 0 on exceptions

* Fix static check

(cherry picked from commit 1d5d502)
@ephraimbuddy ephraimbuddy merged commit c0ffa9c into v2-8-stable Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.