Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize PROD image caching in CI #35438

Merged
merged 1 commit into from
Nov 4, 2023
Merged

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Nov 4, 2023

Turns out that some of the layers in our PROD image got invalidated because AIRFLOW_CONSTRAINTS_MODE used to build the cache for PROD image is "constraints" by default, while building images in "build-images" workflow for regular PRs and canary build uses "constraints-source-providers". The former is fine as default for PROD image (as oppose to CI image we build PROD image from released PyPI packages by default) but the latter is "proper" for the CI cache, because there, the image is built out of local packages prepared from sources.

Turns out that the CONSTRAINT_MODE parameter had a profound impact on caching - because it was set before the
"install_packages_from_branch_tip" step and - in fact - even before "install database clients" step, which caused our cache to only work for the "base OS dependencies" - installing database clients and installing airflow from branch tip (which works great for CI image) had always been done in PRs because the layers in cache with constraints env invalidated all subsequent layers.

This had no big impact before when testing usually took much longer time - but since the testing has been vastly improved in #35160, now PROD image building continues running even after test complete and becomes the next frontier of optimization.

This PR optimizes PROD image building in two ways:

  • caching is prepared with "source_providers" constraint mode, same as regular build

  • the AIRFLOW_CONSTRAINT_MODE and related arguments are moved after installing database clients, so that this parameter does not impact their caching.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Turns out that some of the layers in our PROD image got
invalidated because AIRFLOW_CONSTRAINTS_MODE used to build the
cache for PROD image is "constraints" by default, while building
images in "build-images" workflow for regular PRs and canary
build uses "constraints-source-providers". The former is fine as
default for PROD image (as oppose to CI image we build PROD image
from released PyPI packages by default) but the latter is "proper"
for the CI cache, because there, the image is built out of local
packages prepared from sources.

Turns out that the CONSTRAINT_MODE parameter had a profound impact
on caching - because it was set before the
"install_packages_from_branch_tip" step and - in fact - even
before "install database clients" step, which caused our cache to
only work for the "base OS dependencies" - installing database
clients and installing airflow from branch tip (which works great
for CI image) had always been done in PRs because the layers in
cache with constraints env invalidated all subsequent layers.

This had no big impact before when testing usually took much longer
time - but since the testing has been vastly improved in #35160, now
PROD image building continues running even after test complete and
becomes the next frontier of optimization.

This PR optimizes PROD image building in two ways:

* caching is prepared with "source_providers" constraint mode, same
  as regular build

* the AIRFLOW_CONSTRAINT_MODE and related arguments are moved after
  installing database clients, so that this parameter does not
  impact their caching.
@boring-cyborg boring-cyborg bot added area:dev-tools area:production-image Production image improvements and fixes labels Nov 4, 2023
@potiuk
Copy link
Member Author

potiuk commented Nov 4, 2023

This should likely improve most of PROD building in our CI - PRs that have a need for PROD image to complete, instead of 10-12 minutes waiting for PROD image, will likely have them in 4-5 minutes in most cases.

@potiuk potiuk requested a review from eladkal November 4, 2023 15:34
@potiuk
Copy link
Member Author

potiuk commented Nov 4, 2023

Let's see what will be speed improvement with that one (we will only see after cache from that one will be built in main)

@potiuk potiuk merged commit f50a34b into main Nov 4, 2023
71 checks passed
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Nov 10, 2023
Turns out that some of the layers in our PROD image got
invalidated because AIRFLOW_CONSTRAINTS_MODE used to build the
cache for PROD image is "constraints" by default, while building
images in "build-images" workflow for regular PRs and canary
build uses "constraints-source-providers". The former is fine as
default for PROD image (as oppose to CI image we build PROD image
from released PyPI packages by default) but the latter is "proper"
for the CI cache, because there, the image is built out of local
packages prepared from sources.

Turns out that the CONSTRAINT_MODE parameter had a profound impact
on caching - because it was set before the
"install_packages_from_branch_tip" step and - in fact - even
before "install database clients" step, which caused our cache to
only work for the "base OS dependencies" - installing database
clients and installing airflow from branch tip (which works great
for CI image) had always been done in PRs because the layers in
cache with constraints env invalidated all subsequent layers.

This had no big impact before when testing usually took much longer
time - but since the testing has been vastly improved in apache#35160, now
PROD image building continues running even after test complete and
becomes the next frontier of optimization.

This PR optimizes PROD image building in two ways:

* caching is prepared with "source_providers" constraint mode, same
  as regular build

* the AIRFLOW_CONSTRAINT_MODE and related arguments are moved after
  installing database clients, so that this parameter does not
  impact their caching.
@potiuk potiuk deleted the optimize-prod-image-building-cache branch November 17, 2023 16:22
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Nov 20, 2023
@ephraimbuddy ephraimbuddy added this to the Airflow 2.8.0 milestone Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools area:production-image Production image improvements and fixes changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants