Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Helm artifacts reproducible #36930

Merged

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Jan 20, 2024

Following #36726, #36744, #36763, #36819 this PR adds the feature of
making source tarball that we release as an official release of
the ASF for Helm Chart into reproducible tarball. This means that
anyone should be able to produce such tarball using the sources
of airflow and verify that he tarball pushed to SVN by the
release manager is built from our source repositories.

We also do the same with Helm package. It turns out that gpg signing
of the package does not modify the .tgz file - it just adds .prov file
containing checksum and signature, so we can safely re-pack the .tar.gz
package in a reproducible way, this way we have both reproduciblity and
provenance check nicely working together.

There are few changes in this PR that are related:

  • Bumped Helm version in our environment to use the latest one and
    using the breeze k8s setup-env environment to run all the release
    commands - this way we can be sure same helm version is used to build
    the package, further making it more reproducible.

  • The reproducible packaging utility we have has been refeactored now -
    we take "source" archive as parameter rather than directory and simply
    repack it in reproducible way.

  • The tool also applies group/other ownership removal on its own,
    because helm package has no option to umask the generated files.

  • In this change we also ignore subcharts from being exported to the source
    tarball package as we shoudl not include source files from postgres in
    our source package..

  • Both - the tarball and helm package are generated in dist folder similarly as
    all our other packages.

  • Documentation for releasing the packages and verifying them is updated.

  • CI jobs are updated to use the new commands and generated packages are
    produced as artifacts so that we can be sure the commands continue
    working and produce the right output.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk potiuk force-pushed the add-reproducible-tarball-generation-for-helm-chart branch 7 times, most recently from f7a41ba to 002000a Compare January 21, 2024 08:20
@potiuk
Copy link
Member Author

potiuk commented Jan 21, 2024

cc: @jedcunningham - I'd love you take a look before preparing the Helm Chart to see if that all looks good for you - I made all the variables and package location quite a bit more consistent. Also the source tarball contained postgres.tgz which I think it should not - users can download them on their own and we should not release those sources together with ours IMHO.

@potiuk potiuk force-pushed the add-reproducible-tarball-generation-for-helm-chart branch from 002000a to 30a692d Compare January 21, 2024 08:29
@potiuk potiuk force-pushed the add-reproducible-tarball-generation-for-helm-chart branch 2 times, most recently from f8b8e74 to 04053a7 Compare January 21, 2024 15:55
@potiuk potiuk changed the title Turn Helm chart source tarball into reproducible tarball Make Helm artifacts reproducible Jan 21, 2024
@potiuk
Copy link
Member Author

potiuk commented Jan 21, 2024

I actually managed to get also the helm package reproducible - It turned out to be as easy as repackaging the .tar.gz produced by helm-package in a reproducible way. Signing the packages with helm-gpg does not change the package itself, it only adds .prov file, so we can re-package the produce .tar.gz and use helm gpg to generete .prov.

This way we can nicely combine reproducible packages, signing in ASF way and signing with .prov file. Pretty cool.

@potiuk potiuk force-pushed the add-reproducible-tarball-generation-for-helm-chart branch 4 times, most recently from fe47998 to 7532dc0 Compare January 21, 2024 17:20
@potiuk potiuk force-pushed the add-reproducible-tarball-generation-for-helm-chart branch 2 times, most recently from 11697e9 to 0e4e6ec Compare January 21, 2024 20:54
@potiuk potiuk force-pushed the add-reproducible-tarball-generation-for-helm-chart branch 2 times, most recently from 830f08e to 51b037b Compare January 22, 2024 21:42
Following apache#36726, apache#36744, apache#36763, apache#36819 this PR adds the feature of
making source tarball that we release as an official release of
the ASF for Helm Chart into reproducible tarball. This means that
anyone should be able to produce such tarball using the sources
of airflow and verify that he tarball pushed to SVN by the
release manager is built from our source repositories.

We also do the same with Helm package. It turns out that gpg signing
of the package does not modify the .tgz file - it just adds .prov file
containing checksum and signature, so we can safely re-pack the .tar.gz
package in a reproducible way, this way we have both reproduciblity and
provenance check nicely working together.

There are few changes in this PR that are related:

* Bumped Helm version in our environment to use the latest one and
  using the `breeze k8s setup-env` environment to run all the release
  commands - this way we can be sure same helm version is used to build
  the package, further making it more reproducible.

* The reproducible packaging utility we have has been refeactored now -
  we take "source" archive as parameter rather than directory and simply
  repack it in reproducible way.

* The tool also applies group/other ownership removal on its own,
  because helm package has no option to umask the generated files.

* In this change we also ignore subcharts from being exported to the source
  tarball package as we shoudl not include source files from postgres in
  our source package..

* Both - the tarball and helm package are generated in `dist` folder similarly as
  all our other packages.

* Documentation for releasing the packages and verifying them is updated.

* CI jobs are updated to use the new commands and generated packages are
  produced as artifacts so that we can be sure the commands continue
  working and produce the right output.
@potiuk potiuk force-pushed the add-reproducible-tarball-generation-for-helm-chart branch from 51b037b to ed0584e Compare January 22, 2024 22:42
@potiuk potiuk merged commit 48158c9 into apache:main Jan 22, 2024
80 of 81 checks passed
@potiuk potiuk deleted the add-reproducible-tarball-generation-for-helm-chart branch January 22, 2024 23:54
potiuk added a commit that referenced this pull request Feb 7, 2024
Following #36726, #36744, #36763, #36819 this PR adds the feature of
making source tarball that we release as an official release of
the ASF for Helm Chart into reproducible tarball. This means that
anyone should be able to produce such tarball using the sources
of airflow and verify that he tarball pushed to SVN by the
release manager is built from our source repositories.

We also do the same with Helm package. It turns out that gpg signing
of the package does not modify the .tgz file - it just adds .prov file
containing checksum and signature, so we can safely re-pack the .tar.gz
package in a reproducible way, this way we have both reproduciblity and
provenance check nicely working together.

There are few changes in this PR that are related:

* Bumped Helm version in our environment to use the latest one and
  using the `breeze k8s setup-env` environment to run all the release
  commands - this way we can be sure same helm version is used to build
  the package, further making it more reproducible.

* The reproducible packaging utility we have has been refeactored now -
  we take "source" archive as parameter rather than directory and simply
  repack it in reproducible way.

* The tool also applies group/other ownership removal on its own,
  because helm package has no option to umask the generated files.

* In this change we also ignore subcharts from being exported to the source
  tarball package as we shoudl not include source files from postgres in
  our source package..

* Both - the tarball and helm package are generated in `dist` folder similarly as
  all our other packages.

* Documentation for releasing the packages and verifying them is updated.

* CI jobs are updated to use the new commands and generated packages are
  produced as artifacts so that we can be sure the commands continue
  working and produce the right output.

(cherry picked from commit 48158c9)
@potiuk potiuk added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Feb 8, 2024
@potiuk potiuk added this to the Airflow 2.8.2 milestone Feb 8, 2024
ephraimbuddy pushed a commit that referenced this pull request Feb 22, 2024
Following #36726, #36744, #36763, #36819 this PR adds the feature of
making source tarball that we release as an official release of
the ASF for Helm Chart into reproducible tarball. This means that
anyone should be able to produce such tarball using the sources
of airflow and verify that he tarball pushed to SVN by the
release manager is built from our source repositories.

We also do the same with Helm package. It turns out that gpg signing
of the package does not modify the .tgz file - it just adds .prov file
containing checksum and signature, so we can safely re-pack the .tar.gz
package in a reproducible way, this way we have both reproduciblity and
provenance check nicely working together.

There are few changes in this PR that are related:

* Bumped Helm version in our environment to use the latest one and
  using the `breeze k8s setup-env` environment to run all the release
  commands - this way we can be sure same helm version is used to build
  the package, further making it more reproducible.

* The reproducible packaging utility we have has been refeactored now -
  we take "source" archive as parameter rather than directory and simply
  repack it in reproducible way.

* The tool also applies group/other ownership removal on its own,
  because helm package has no option to umask the generated files.

* In this change we also ignore subcharts from being exported to the source
  tarball package as we shoudl not include source files from postgres in
  our source package..

* Both - the tarball and helm package are generated in `dist` folder similarly as
  all our other packages.

* Documentation for releasing the packages and verifying them is updated.

* CI jobs are updated to use the new commands and generated packages are
  produced as artifacts so that we can be sure the commands continue
  working and produce the right output.

(cherry picked from commit 48158c9)
potiuk added a commit to potiuk/airflow that referenced this pull request Oct 22, 2024
The apache#36930 added constraints to creation of k8s environment, but it
had a side effect - the constraints could not be created if source
of airflow had dependencies conflicting with constraints (which
might happen for example when we upgrade FAB - because locally
pinned FAB might be different than the one in constraints).

Also the constraints were "hard-coded" - the python version,
branch and github repository were hard-coded.

This PR fixes both problems:

* constraints URL is dynamically generated based on current
  branch, repo and python version
* in case attempts to create the venv with constraints fails,
  we attempt to install it again without constraints
potiuk added a commit to potiuk/airflow that referenced this pull request Oct 22, 2024
The apache#36930 added constraints to creation of k8s environment, but it
had a side effect - the constraints could not be created if source
of airflow had dependencies conflicting with constraints (which
might happen for example when we upgrade FAB - because locally
pinned FAB might be different than the one in constraints).

Also the constraints were "hard-coded" - the python version,
branch and github repository were hard-coded.

This PR fixes both problems:

* constraints URL is dynamically generated based on current
  branch, repo and python version
* in case attempts to create the venv with constraints fails,
  we attempt to install it again without constraints
potiuk added a commit to potiuk/airflow that referenced this pull request Oct 22, 2024
The apache#36930 added constraints to creation of k8s environment, but it
had a side effect - the constraints could not be created if source
of airflow had dependencies conflicting with constraints (which
might happen for example when we upgrade FAB - because locally
pinned FAB might be different than the one in constraints).

Also the constraints were "hard-coded" - the python version,
branch and github repository were hard-coded.

This PR fixes both problems:

* constraints URL is dynamically generated based on current
  branch, repo and python version
* in case attempts to create the venv with constraints fails,
  we attempt to install it again without constraints
potiuk added a commit that referenced this pull request Oct 23, 2024
…43276)

The #36930 added constraints to creation of k8s environment, but it
had a side effect - the constraints could not be created if source
of airflow had dependencies conflicting with constraints (which
might happen for example when we upgrade FAB - because locally
pinned FAB might be different than the one in constraints).

Also the constraints were "hard-coded" - the python version,
branch and github repository were hard-coded.

This PR fixes both problems:

* constraints URL is dynamically generated based on current
  branch, repo and python version
* in case attempts to create the venv with constraints fails,
  we attempt to install it again without constraints
potiuk added a commit to potiuk/airflow that referenced this pull request Oct 23, 2024
…pache#43276)

The apache#36930 added constraints to creation of k8s environment, but it
had a side effect - the constraints could not be created if source
of airflow had dependencies conflicting with constraints (which
might happen for example when we upgrade FAB - because locally
pinned FAB might be different than the one in constraints).

Also the constraints were "hard-coded" - the python version,
branch and github repository were hard-coded.

This PR fixes both problems:

* constraints URL is dynamically generated based on current
  branch, repo and python version
* in case attempts to create the venv with constraints fails,
  we attempt to install it again without constraints

(cherry picked from commit 274b6e1)
potiuk added a commit that referenced this pull request Oct 23, 2024
#43298)

* Fix edge-case when conflicting constraints prevent k8s env creation (#43276)

The #36930 added constraints to creation of k8s environment, but it
had a side effect - the constraints could not be created if source
of airflow had dependencies conflicting with constraints (which
might happen for example when we upgrade FAB - because locally
pinned FAB might be different than the one in constraints).

Also the constraints were "hard-coded" - the python version,
branch and github repository were hard-coded.

This PR fixes both problems:

* constraints URL is dynamically generated based on current
  branch, repo and python version
* in case attempts to create the venv with constraints fails,
  we attempt to install it again without constraints

(cherry picked from commit 274b6e1)

* Update k8s_requirements.txt
harjeevanmaan pushed a commit to harjeevanmaan/airflow that referenced this pull request Oct 23, 2024
…pache#43276)

The apache#36930 added constraints to creation of k8s environment, but it
had a side effect - the constraints could not be created if source
of airflow had dependencies conflicting with constraints (which
might happen for example when we upgrade FAB - because locally
pinned FAB might be different than the one in constraints).

Also the constraints were "hard-coded" - the python version,
branch and github repository were hard-coded.

This PR fixes both problems:

* constraints URL is dynamically generated based on current
  branch, repo and python version
* in case attempts to create the venv with constraints fails,
  we attempt to install it again without constraints
PaulKobow7536 pushed a commit to PaulKobow7536/airflow that referenced this pull request Oct 24, 2024
…pache#43276)

The apache#36930 added constraints to creation of k8s environment, but it
had a side effect - the constraints could not be created if source
of airflow had dependencies conflicting with constraints (which
might happen for example when we upgrade FAB - because locally
pinned FAB might be different than the one in constraints).

Also the constraints were "hard-coded" - the python version,
branch and github repository were hard-coded.

This PR fixes both problems:

* constraints URL is dynamically generated based on current
  branch, repo and python version
* in case attempts to create the venv with constraints fails,
  we attempt to install it again without constraints
utkarsharma2 pushed a commit that referenced this pull request Oct 24, 2024
#43298)

* Fix edge-case when conflicting constraints prevent k8s env creation (#43276)

The #36930 added constraints to creation of k8s environment, but it
had a side effect - the constraints could not be created if source
of airflow had dependencies conflicting with constraints (which
might happen for example when we upgrade FAB - because locally
pinned FAB might be different than the one in constraints).

Also the constraints were "hard-coded" - the python version,
branch and github repository were hard-coded.

This PR fixes both problems:

* constraints URL is dynamically generated based on current
  branch, repo and python version
* in case attempts to create the venv with constraints fails,
  we attempt to install it again without constraints

(cherry picked from commit 274b6e1)

* Update k8s_requirements.txt
ellisms pushed a commit to ellisms/airflow that referenced this pull request Nov 13, 2024
…pache#43276)

The apache#36930 added constraints to creation of k8s environment, but it
had a side effect - the constraints could not be created if source
of airflow had dependencies conflicting with constraints (which
might happen for example when we upgrade FAB - because locally
pinned FAB might be different than the one in constraints).

Also the constraints were "hard-coded" - the python version,
branch and github repository were hard-coded.

This PR fixes both problems:

* constraints URL is dynamically generated based on current
  branch, repo and python version
* in case attempts to create the venv with constraints fails,
  we attempt to install it again without constraints
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools area:helm-chart Airflow Helm Chart changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants