Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Jan 12, 2024

Source tarball is the main artifact produced by the release process - one that is the "official" release and named like that by the Apache Software Foundation.

This PR makes the source tarball generation reproducible - following reproducibility of the .whl and sdist packages.

This change adds:

  • vendors-in reproducible.py script that repacks .tar.gz package in reproducible way using source-date-epoch as timestamps
  • breeze release-management prepare-airflow-tarball command
  • adds verification of the tarballs to PMC verification process
  • adds --use-local-hatch for package building command to allow for faster / non-docker build of packages for PMC verification
  • improves diagnostic output of the release and build commands

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk potiuk force-pushed the update-release-process-for-reproducibles-source-tarball branch from f446d2d to 81096f8 Compare January 12, 2024 02:42
Copy link
Contributor

@ephraimbuddy ephraimbuddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@potiuk
Copy link
Member Author

potiuk commented Jan 12, 2024

Yeah. As the next step after that, I want to turn that script into something that will be regularly run in our CI - similarly as our other release commands in Breeze - that will hopefully prevent any typos and makes it all-but-guaranteed to work always.

Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potiuk 🚢 it!
Only a few nits!

Source tarball is the main artifact produced by the release
process - one that is the "official" release and named like that
by the Apache Software Foundation.

This PR makes the source tarball generation reproducible - following
reproducibility of the `.whl` and `sdist` packages.

This change adds:

* vendors-in reproducible.py script that repacks .tar.gz package
  in reproducible way using source-date-epoch as timestamps
* breeze release-management prepare-airflow-tarball command
* adds verification of the tarballs to PMC verification process
* adds --use-local-hatch for package building command to allow for
  faster / non-docker build of packages for PMC verification
* improves diagnostic output of the release and build commands
@potiuk potiuk force-pushed the update-release-process-for-reproducibles-source-tarball branch from 04bb8eb to 64f6e8e Compare January 12, 2024 15:00
@potiuk potiuk merged commit 72a571d into apache:main Jan 12, 2024
@potiuk potiuk deleted the update-release-process-for-reproducibles-source-tarball branch January 12, 2024 16:39
@potiuk potiuk added this to the Airflow 2.8.1 milestone Jan 13, 2024
@potiuk potiuk added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Jan 13, 2024
potiuk added a commit that referenced this pull request Jan 13, 2024
Source tarball is the main artifact produced by the release
process - one that is the "official" release and named like that
by the Apache Software Foundation.

This PR makes the source tarball generation reproducible - following
reproducibility of the `.whl` and `sdist` packages.

This change adds:

* vendors-in reproducible.py script that repacks .tar.gz package
  in reproducible way using source-date-epoch as timestamps
* breeze release-management prepare-airflow-tarball command
* adds verification of the tarballs to PMC verification process
* adds --use-local-hatch for package building command to allow for
  faster / non-docker build of packages for PMC verification
* improves diagnostic output of the release and build commands

(cherry picked from commit 72a571d)
ephraimbuddy pushed a commit that referenced this pull request Jan 15, 2024
Source tarball is the main artifact produced by the release
process - one that is the "official" release and named like that
by the Apache Software Foundation.

This PR makes the source tarball generation reproducible - following
reproducibility of the `.whl` and `sdist` packages.

This change adds:

* vendors-in reproducible.py script that repacks .tar.gz package
  in reproducible way using source-date-epoch as timestamps
* breeze release-management prepare-airflow-tarball command
* adds verification of the tarballs to PMC verification process
* adds --use-local-hatch for package building command to allow for
  faster / non-docker build of packages for PMC verification
* improves diagnostic output of the release and build commands

(cherry picked from commit 72a571d)
potiuk added a commit to potiuk/airflow that referenced this pull request Jan 22, 2024
Following apache#36726, apache#36744, apache#36763, apache#36819 this PR adds the feature of
making source tarball that we release as an official release of
the ASF for Helm Chart into reproducible tarball. This means that
anyone should be able to produce such tarball using the sources
of airflow and verify that he tarball pushed to SVN by the
release manager is built from our source repositories.

We also do the same with Helm package. It turns out that gpg signing
of the package does not modify the .tgz file - it just adds .prov file
containing checksum and signature, so we can safely re-pack the .tar.gz
package in a reproducible way, this way we have both reproduciblity and
provenance check nicely working together.

There are few changes in this PR that are related:

* Bumped Helm version in our environment to use the latest one and
  using the `breeze k8s setup-env` environment to run all the release
  commands - this way we can be sure same helm version is used to build
  the package, further making it more reproducible.

* The reproducible packaging utility we have has been refeactored now -
  we take "source" archive as parameter rather than directory and simply
  repack it in reproducible way.

* The tool also applies group/other ownership removal on its own,
  because helm package has no option to umask the generated files.

* In this change we also ignore subcharts from being exported to the source
  tarball package as we shoudl not include source files from postgres in
  our source package..

* Both - the tarball and helm package are generated in `dist` folder similarly as
  all our other packages.

* Documentation for releasing the packages and verifying them is updated.

* CI jobs are updated to use the new commands and generated packages are
  produced as artifacts so that we can be sure the commands continue
  working and produce the right output.
potiuk added a commit that referenced this pull request Jan 22, 2024
Following #36726, #36744, #36763, #36819 this PR adds the feature of
making source tarball that we release as an official release of
the ASF for Helm Chart into reproducible tarball. This means that
anyone should be able to produce such tarball using the sources
of airflow and verify that he tarball pushed to SVN by the
release manager is built from our source repositories.

We also do the same with Helm package. It turns out that gpg signing
of the package does not modify the .tgz file - it just adds .prov file
containing checksum and signature, so we can safely re-pack the .tar.gz
package in a reproducible way, this way we have both reproduciblity and
provenance check nicely working together.

There are few changes in this PR that are related:

* Bumped Helm version in our environment to use the latest one and
  using the `breeze k8s setup-env` environment to run all the release
  commands - this way we can be sure same helm version is used to build
  the package, further making it more reproducible.

* The reproducible packaging utility we have has been refeactored now -
  we take "source" archive as parameter rather than directory and simply
  repack it in reproducible way.

* The tool also applies group/other ownership removal on its own,
  because helm package has no option to umask the generated files.

* In this change we also ignore subcharts from being exported to the source
  tarball package as we shoudl not include source files from postgres in
  our source package..

* Both - the tarball and helm package are generated in `dist` folder similarly as
  all our other packages.

* Documentation for releasing the packages and verifying them is updated.

* CI jobs are updated to use the new commands and generated packages are
  produced as artifacts so that we can be sure the commands continue
  working and produce the right output.
potiuk added a commit that referenced this pull request Feb 7, 2024
Following #36726, #36744, #36763, #36819 this PR adds the feature of
making source tarball that we release as an official release of
the ASF for Helm Chart into reproducible tarball. This means that
anyone should be able to produce such tarball using the sources
of airflow and verify that he tarball pushed to SVN by the
release manager is built from our source repositories.

We also do the same with Helm package. It turns out that gpg signing
of the package does not modify the .tgz file - it just adds .prov file
containing checksum and signature, so we can safely re-pack the .tar.gz
package in a reproducible way, this way we have both reproduciblity and
provenance check nicely working together.

There are few changes in this PR that are related:

* Bumped Helm version in our environment to use the latest one and
  using the `breeze k8s setup-env` environment to run all the release
  commands - this way we can be sure same helm version is used to build
  the package, further making it more reproducible.

* The reproducible packaging utility we have has been refeactored now -
  we take "source" archive as parameter rather than directory and simply
  repack it in reproducible way.

* The tool also applies group/other ownership removal on its own,
  because helm package has no option to umask the generated files.

* In this change we also ignore subcharts from being exported to the source
  tarball package as we shoudl not include source files from postgres in
  our source package..

* Both - the tarball and helm package are generated in `dist` folder similarly as
  all our other packages.

* Documentation for releasing the packages and verifying them is updated.

* CI jobs are updated to use the new commands and generated packages are
  produced as artifacts so that we can be sure the commands continue
  working and produce the right output.

(cherry picked from commit 48158c9)
ephraimbuddy pushed a commit that referenced this pull request Feb 22, 2024
Following #36726, #36744, #36763, #36819 this PR adds the feature of
making source tarball that we release as an official release of
the ASF for Helm Chart into reproducible tarball. This means that
anyone should be able to produce such tarball using the sources
of airflow and verify that he tarball pushed to SVN by the
release manager is built from our source repositories.

We also do the same with Helm package. It turns out that gpg signing
of the package does not modify the .tgz file - it just adds .prov file
containing checksum and signature, so we can safely re-pack the .tar.gz
package in a reproducible way, this way we have both reproduciblity and
provenance check nicely working together.

There are few changes in this PR that are related:

* Bumped Helm version in our environment to use the latest one and
  using the `breeze k8s setup-env` environment to run all the release
  commands - this way we can be sure same helm version is used to build
  the package, further making it more reproducible.

* The reproducible packaging utility we have has been refeactored now -
  we take "source" archive as parameter rather than directory and simply
  repack it in reproducible way.

* The tool also applies group/other ownership removal on its own,
  because helm package has no option to umask the generated files.

* In this change we also ignore subcharts from being exported to the source
  tarball package as we shoudl not include source files from postgres in
  our source package..

* Both - the tarball and helm package are generated in `dist` folder similarly as
  all our other packages.

* Documentation for releasing the packages and verifying them is updated.

* CI jobs are updated to use the new commands and generated packages are
  produced as artifacts so that we can be sure the commands continue
  working and produce the right output.

(cherry picked from commit 48158c9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants