Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Jan 10, 2024

Hatch has built-in support for reproducible builds, however it uses a hard-coded 2020 date to generate the reproducible binaries, which produces whl, tar.gz files that contain file dates that are pretty old. This might be confusing for anyone who is looking at the file contents and timestamp inside.

This PR adds support (similar to provider approach) to store current reproducible date in the repository - so that it can be committed and tagged together with Airflow sources. It is updated fully automaticallly by pre-commit whenever release notes change, which basically means that whenever release notes are update just before release, the reproducible date is updated to current date.

For now we only check if the packages produced by hatchling build are reproducible.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Jan 10, 2024

Quick follow-up after #36537 - adding nice, reproducible build support for Airflow packages.

@potiuk potiuk force-pushed the reproducible-airflow-builds branch from 724d459 to 6d22583 Compare January 11, 2024 14:54
@potiuk
Copy link
Member Author

potiuk commented Jan 11, 2024

Had to move the location of the reproducible_build.yaml - it is now in "airflow" root - which is better because it will also allow to run reproducible build for anyone who whill just get airflow sources.

I wil likely later have to contribute a small thing (or maybe plugin will be enough) to make it possible for hatchling to use that information without setting the environment variable first. Might be a nice contribution to hatchling :)

@potiuk potiuk force-pushed the reproducible-airflow-builds branch 2 times, most recently from e448125 to 12bcae8 Compare January 11, 2024 17:28
@potiuk
Copy link
Member Author

potiuk commented Jan 11, 2024

Would love to merge that one - then I could cherry-pick it to 2.8.1 and get reproducible 2.8.1 build already :)

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a small nit, LGTM

Hatch has built-in support for reproducible builds, however it
uses a hard-coded 2020 date to generate the reproducible binaries,
which produces whl, tar.gz files that contain file dates that are
pretty old. This might be confusing for anyone who is looking at
the file contents and timestamp inside.

This PR adds support (similar to provider approach) to store current
reproducible date in the repository - so that it can be committed
and tagged together with Airflow sources. It is updated fully
automaticallly by pre-commit whenever release notes change, which
basically means that whenever release notes are update just
before release, the reproducible date is updated to current date.

For now we only check if the packages produced by hatchling
build are reproducible.
@potiuk potiuk force-pushed the reproducible-airflow-builds branch from 12bcae8 to f85da7b Compare January 11, 2024 23:37
@potiuk
Copy link
Member Author

potiuk commented Jan 11, 2024

Resolved both :)

@potiuk potiuk merged commit a2d6c38 into apache:main Jan 12, 2024
@potiuk potiuk deleted the reproducible-airflow-builds branch January 12, 2024 00:29
@potiuk potiuk added this to the Airflow 2.8.1 milestone Jan 13, 2024
potiuk added a commit that referenced this pull request Jan 13, 2024
…36726)

Hatch has built-in support for reproducible builds, however it
uses a hard-coded 2020 date to generate the reproducible binaries,
which produces whl, tar.gz files that contain file dates that are
pretty old. This might be confusing for anyone who is looking at
the file contents and timestamp inside.

This PR adds support (similar to provider approach) to store current
reproducible date in the repository - so that it can be committed
and tagged together with Airflow sources. It is updated fully
automaticallly by pre-commit whenever release notes change, which
basically means that whenever release notes are update just
before release, the reproducible date is updated to current date.

For now we only check if the packages produced by hatchling
build are reproducible.

(cherry picked from commit a2d6c38)
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Jan 15, 2024
ephraimbuddy pushed a commit that referenced this pull request Jan 15, 2024
…36726)

Hatch has built-in support for reproducible builds, however it
uses a hard-coded 2020 date to generate the reproducible binaries,
which produces whl, tar.gz files that contain file dates that are
pretty old. This might be confusing for anyone who is looking at
the file contents and timestamp inside.

This PR adds support (similar to provider approach) to store current
reproducible date in the repository - so that it can be committed
and tagged together with Airflow sources. It is updated fully
automaticallly by pre-commit whenever release notes change, which
basically means that whenever release notes are update just
before release, the reproducible date is updated to current date.

For now we only check if the packages produced by hatchling
build are reproducible.

(cherry picked from commit a2d6c38)
potiuk added a commit to potiuk/airflow that referenced this pull request Jan 22, 2024
Following apache#36726, apache#36744, apache#36763, apache#36819 this PR adds the feature of
making source tarball that we release as an official release of
the ASF for Helm Chart into reproducible tarball. This means that
anyone should be able to produce such tarball using the sources
of airflow and verify that he tarball pushed to SVN by the
release manager is built from our source repositories.

We also do the same with Helm package. It turns out that gpg signing
of the package does not modify the .tgz file - it just adds .prov file
containing checksum and signature, so we can safely re-pack the .tar.gz
package in a reproducible way, this way we have both reproduciblity and
provenance check nicely working together.

There are few changes in this PR that are related:

* Bumped Helm version in our environment to use the latest one and
  using the `breeze k8s setup-env` environment to run all the release
  commands - this way we can be sure same helm version is used to build
  the package, further making it more reproducible.

* The reproducible packaging utility we have has been refeactored now -
  we take "source" archive as parameter rather than directory and simply
  repack it in reproducible way.

* The tool also applies group/other ownership removal on its own,
  because helm package has no option to umask the generated files.

* In this change we also ignore subcharts from being exported to the source
  tarball package as we shoudl not include source files from postgres in
  our source package..

* Both - the tarball and helm package are generated in `dist` folder similarly as
  all our other packages.

* Documentation for releasing the packages and verifying them is updated.

* CI jobs are updated to use the new commands and generated packages are
  produced as artifacts so that we can be sure the commands continue
  working and produce the right output.
potiuk added a commit that referenced this pull request Jan 22, 2024
Following #36726, #36744, #36763, #36819 this PR adds the feature of
making source tarball that we release as an official release of
the ASF for Helm Chart into reproducible tarball. This means that
anyone should be able to produce such tarball using the sources
of airflow and verify that he tarball pushed to SVN by the
release manager is built from our source repositories.

We also do the same with Helm package. It turns out that gpg signing
of the package does not modify the .tgz file - it just adds .prov file
containing checksum and signature, so we can safely re-pack the .tar.gz
package in a reproducible way, this way we have both reproduciblity and
provenance check nicely working together.

There are few changes in this PR that are related:

* Bumped Helm version in our environment to use the latest one and
  using the `breeze k8s setup-env` environment to run all the release
  commands - this way we can be sure same helm version is used to build
  the package, further making it more reproducible.

* The reproducible packaging utility we have has been refeactored now -
  we take "source" archive as parameter rather than directory and simply
  repack it in reproducible way.

* The tool also applies group/other ownership removal on its own,
  because helm package has no option to umask the generated files.

* In this change we also ignore subcharts from being exported to the source
  tarball package as we shoudl not include source files from postgres in
  our source package..

* Both - the tarball and helm package are generated in `dist` folder similarly as
  all our other packages.

* Documentation for releasing the packages and verifying them is updated.

* CI jobs are updated to use the new commands and generated packages are
  produced as artifacts so that we can be sure the commands continue
  working and produce the right output.
potiuk added a commit that referenced this pull request Feb 7, 2024
Following #36726, #36744, #36763, #36819 this PR adds the feature of
making source tarball that we release as an official release of
the ASF for Helm Chart into reproducible tarball. This means that
anyone should be able to produce such tarball using the sources
of airflow and verify that he tarball pushed to SVN by the
release manager is built from our source repositories.

We also do the same with Helm package. It turns out that gpg signing
of the package does not modify the .tgz file - it just adds .prov file
containing checksum and signature, so we can safely re-pack the .tar.gz
package in a reproducible way, this way we have both reproduciblity and
provenance check nicely working together.

There are few changes in this PR that are related:

* Bumped Helm version in our environment to use the latest one and
  using the `breeze k8s setup-env` environment to run all the release
  commands - this way we can be sure same helm version is used to build
  the package, further making it more reproducible.

* The reproducible packaging utility we have has been refeactored now -
  we take "source" archive as parameter rather than directory and simply
  repack it in reproducible way.

* The tool also applies group/other ownership removal on its own,
  because helm package has no option to umask the generated files.

* In this change we also ignore subcharts from being exported to the source
  tarball package as we shoudl not include source files from postgres in
  our source package..

* Both - the tarball and helm package are generated in `dist` folder similarly as
  all our other packages.

* Documentation for releasing the packages and verifying them is updated.

* CI jobs are updated to use the new commands and generated packages are
  produced as artifacts so that we can be sure the commands continue
  working and produce the right output.

(cherry picked from commit 48158c9)
ephraimbuddy pushed a commit that referenced this pull request Feb 22, 2024
Following #36726, #36744, #36763, #36819 this PR adds the feature of
making source tarball that we release as an official release of
the ASF for Helm Chart into reproducible tarball. This means that
anyone should be able to produce such tarball using the sources
of airflow and verify that he tarball pushed to SVN by the
release manager is built from our source repositories.

We also do the same with Helm package. It turns out that gpg signing
of the package does not modify the .tgz file - it just adds .prov file
containing checksum and signature, so we can safely re-pack the .tar.gz
package in a reproducible way, this way we have both reproduciblity and
provenance check nicely working together.

There are few changes in this PR that are related:

* Bumped Helm version in our environment to use the latest one and
  using the `breeze k8s setup-env` environment to run all the release
  commands - this way we can be sure same helm version is used to build
  the package, further making it more reproducible.

* The reproducible packaging utility we have has been refeactored now -
  we take "source" archive as parameter rather than directory and simply
  repack it in reproducible way.

* The tool also applies group/other ownership removal on its own,
  because helm package has no option to umask the generated files.

* In this change we also ignore subcharts from being exported to the source
  tarball package as we shoudl not include source files from postgres in
  our source package..

* Both - the tarball and helm package are generated in `dist` folder similarly as
  all our other packages.

* Documentation for releasing the packages and verifying them is updated.

* CI jobs are updated to use the new commands and generated packages are
  produced as artifacts so that we can be sure the commands continue
  working and produce the right output.

(cherry picked from commit 48158c9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants