Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Python client in reproducible way #36763

Merged
merged 1 commit into from
Jan 14, 2024

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Jan 13, 2024

Client source code and package generation was done using the code generated and committed to airflow-client-python and while the repository with such code is useful to have, it's just a convenience repo, because all sources are (and should be) generated from the API specification which is present in the Airflow repository.

This also made the reproducible builds and package generation not really possible, because we never knew if the source generated in the airflow-client-python repository has been generated and not tampered with.

While implementing it, it turned out that there were some issues in the past that nade our client generation somewhat broken..

  • In 2.7.0 python client, we added the same code twice (See Add Client Version 2.7.0 airflow-client-python#93) on top of "airflow_client.client" package, we also added copy of the API client generated in "airflow_client.airflow_client" - that was likely due to bad bash scripts and tools that were used to generate it and errors during generation the clients.

  • We used to generate the code for "client" package and then moved the "client" package to "airflow_client.client" package, while manually modifying imports with sed (!?). That was likely due to limitations in some old version of the client generator. However the client generator we use now is capable of generating code directly in the "airflow_client.client" package.

  • We also manually (via pre-commit) added Apache Licence to the generated files. Whieh was completely unnecessary, because ASF rules do not require licence headers to be added to code automatically generated from a code that already has ASF licence.

  • We also generated source tarball packages from such generated code, which was completely unnecessary - because sdist packages are already fulfilling all the reqirements of such source pacakges - the code in the packages is enough to build the package from the sources and it does not contain any binary code, moreover the code is generated out of the API specificiation, which means that anyone can take the code and genearate the pacakged software from just sources in sdist. Similarly as in case of provider packages, we do not need to produce separate -source.tar.gz files.

This PR fixes all of it.

First of all the source that lands in the source repository airflow-client-python and sdist/wheel packages are generated directly from the openapi specification.

They are generated using breeze release_management command from airflow source tagged with specific tag in the Airflow repo (including the source of reproducible build date that is updated together with airflow release notes. This means that any PMC member can regenerate packages (binary identical) straight from the Airflow repository - without going through "airflow-client-python" repository.

No source tarball is generated - it is not needed, sdist is enough.

The test_python_client.py has been also moved over to Airflow repo
and updated with handling case when expose_config is not enabled and
it is used to automatically test the API client after it has been
generated.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:dev-tools labels Jan 13, 2024
@potiuk potiuk force-pushed the reproducible-python-client-build branch 7 times, most recently from 8d2cf2b to bb673a5 Compare January 13, 2024 22:40
potiuk added a commit to apache/airflow-client-python that referenced this pull request Jan 14, 2024
Accompanying apache/airflow#36763 where we
fix and modernize the way how Python client gets generated, this
on is result of applying the modernisation:

* converting project configuration to pyproject.toml and removing
  all setup.* and requirements files
* converting to modern packaging backend (hatchling) to build
  wheel and sdist packages (with reproducibility)
* using hatch test environment definition and coverage to run tests
* we deleted release instructions and dev tools (the ones in Airflow
  will be used to generate and sign the package)
* similarly .pre-comit-config.yml is not needed as the client
  gets generated in the Apache Airflow project.

All those are going to be maintained in Apache Airflow project
as part of apache/airflow#36763 as a single
source of truth - when new Python client gets released the project
files will be overwritten from those prepare from Airflow project,
this is just initial PR to seed the new
potiuk added a commit to apache/airflow-client-python that referenced this pull request Jan 14, 2024
Accompanying apache/airflow#36763 where we
fix and modernize the way how Python client gets generated, this
on is result of applying the modernisation:

* converting project configuration to pyproject.toml and removing
  all setup.* and requirements files
* converting to modern packaging backend (hatchling) to build
  wheel and sdist packages (with reproducibility)
* using hatch test environment definition and coverage to run tests
* we deleted release instructions and dev tools (the ones in Airflow
  will be used to generate and sign the package)
* similarly .pre-comit-config.yml is not needed as the client
  gets generated in the Apache Airflow project.

All those are going to be maintained in Apache Airflow project
as part of apache/airflow#36763 as a single
source of truth - when new Python client gets released the project
files will be overwritten from those prepare from Airflow project,
this is just initial PR to seed the new
potiuk added a commit to potiuk/airflow that referenced this pull request Jan 22, 2024
Following apache#36726, apache#36744, apache#36763, apache#36819 this PR adds the feature of
making source tarball that we release as an official release of
the ASF for Helm Chart into reproducible tarball. This means that
anyone should be able to produce such tarball using the sources
of airflow and verify that he tarball pushed to SVN by the
release manager is built from our source repositories.

We also do the same with Helm package. It turns out that gpg signing
of the package does not modify the .tgz file - it just adds .prov file
containing checksum and signature, so we can safely re-pack the .tar.gz
package in a reproducible way, this way we have both reproduciblity and
provenance check nicely working together.

There are few changes in this PR that are related:

* Bumped Helm version in our environment to use the latest one and
  using the `breeze k8s setup-env` environment to run all the release
  commands - this way we can be sure same helm version is used to build
  the package, further making it more reproducible.

* The reproducible packaging utility we have has been refeactored now -
  we take "source" archive as parameter rather than directory and simply
  repack it in reproducible way.

* The tool also applies group/other ownership removal on its own,
  because helm package has no option to umask the generated files.

* In this change we also ignore subcharts from being exported to the source
  tarball package as we shoudl not include source files from postgres in
  our source package..

* Both - the tarball and helm package are generated in `dist` folder similarly as
  all our other packages.

* Documentation for releasing the packages and verifying them is updated.

* CI jobs are updated to use the new commands and generated packages are
  produced as artifacts so that we can be sure the commands continue
  working and produce the right output.
potiuk added a commit that referenced this pull request Jan 22, 2024
Following #36726, #36744, #36763, #36819 this PR adds the feature of
making source tarball that we release as an official release of
the ASF for Helm Chart into reproducible tarball. This means that
anyone should be able to produce such tarball using the sources
of airflow and verify that he tarball pushed to SVN by the
release manager is built from our source repositories.

We also do the same with Helm package. It turns out that gpg signing
of the package does not modify the .tgz file - it just adds .prov file
containing checksum and signature, so we can safely re-pack the .tar.gz
package in a reproducible way, this way we have both reproduciblity and
provenance check nicely working together.

There are few changes in this PR that are related:

* Bumped Helm version in our environment to use the latest one and
  using the `breeze k8s setup-env` environment to run all the release
  commands - this way we can be sure same helm version is used to build
  the package, further making it more reproducible.

* The reproducible packaging utility we have has been refeactored now -
  we take "source" archive as parameter rather than directory and simply
  repack it in reproducible way.

* The tool also applies group/other ownership removal on its own,
  because helm package has no option to umask the generated files.

* In this change we also ignore subcharts from being exported to the source
  tarball package as we shoudl not include source files from postgres in
  our source package..

* Both - the tarball and helm package are generated in `dist` folder similarly as
  all our other packages.

* Documentation for releasing the packages and verifying them is updated.

* CI jobs are updated to use the new commands and generated packages are
  produced as artifacts so that we can be sure the commands continue
  working and produce the right output.
potiuk added a commit that referenced this pull request Feb 7, 2024
Following #36726, #36744, #36763, #36819 this PR adds the feature of
making source tarball that we release as an official release of
the ASF for Helm Chart into reproducible tarball. This means that
anyone should be able to produce such tarball using the sources
of airflow and verify that he tarball pushed to SVN by the
release manager is built from our source repositories.

We also do the same with Helm package. It turns out that gpg signing
of the package does not modify the .tgz file - it just adds .prov file
containing checksum and signature, so we can safely re-pack the .tar.gz
package in a reproducible way, this way we have both reproduciblity and
provenance check nicely working together.

There are few changes in this PR that are related:

* Bumped Helm version in our environment to use the latest one and
  using the `breeze k8s setup-env` environment to run all the release
  commands - this way we can be sure same helm version is used to build
  the package, further making it more reproducible.

* The reproducible packaging utility we have has been refeactored now -
  we take "source" archive as parameter rather than directory and simply
  repack it in reproducible way.

* The tool also applies group/other ownership removal on its own,
  because helm package has no option to umask the generated files.

* In this change we also ignore subcharts from being exported to the source
  tarball package as we shoudl not include source files from postgres in
  our source package..

* Both - the tarball and helm package are generated in `dist` folder similarly as
  all our other packages.

* Documentation for releasing the packages and verifying them is updated.

* CI jobs are updated to use the new commands and generated packages are
  produced as artifacts so that we can be sure the commands continue
  working and produce the right output.

(cherry picked from commit 48158c9)
@ephraimbuddy ephraimbuddy added the type:misc/internal Changelog: Misc changes that should appear in change log label Feb 19, 2024
ephraimbuddy pushed a commit that referenced this pull request Feb 22, 2024
Following #36726, #36744, #36763, #36819 this PR adds the feature of
making source tarball that we release as an official release of
the ASF for Helm Chart into reproducible tarball. This means that
anyone should be able to produce such tarball using the sources
of airflow and verify that he tarball pushed to SVN by the
release manager is built from our source repositories.

We also do the same with Helm package. It turns out that gpg signing
of the package does not modify the .tgz file - it just adds .prov file
containing checksum and signature, so we can safely re-pack the .tar.gz
package in a reproducible way, this way we have both reproduciblity and
provenance check nicely working together.

There are few changes in this PR that are related:

* Bumped Helm version in our environment to use the latest one and
  using the `breeze k8s setup-env` environment to run all the release
  commands - this way we can be sure same helm version is used to build
  the package, further making it more reproducible.

* The reproducible packaging utility we have has been refeactored now -
  we take "source" archive as parameter rather than directory and simply
  repack it in reproducible way.

* The tool also applies group/other ownership removal on its own,
  because helm package has no option to umask the generated files.

* In this change we also ignore subcharts from being exported to the source
  tarball package as we shoudl not include source files from postgres in
  our source package..

* Both - the tarball and helm package are generated in `dist` folder similarly as
  all our other packages.

* Documentation for releasing the packages and verifying them is updated.

* CI jobs are updated to use the new commands and generated packages are
  produced as artifacts so that we can be sure the commands continue
  working and produce the right output.

(cherry picked from commit 48158c9)
@ephraimbuddy ephraimbuddy added changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) and removed type:misc/internal Changelog: Misc changes that should appear in change log labels Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API area:dev-tools changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants