-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate Python client in reproducible way #36763
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
potiuk
requested review from
kaxil,
pierrejeambrun,
ashb and
jedcunningham
as code owners
January 13, 2024 21:51
potiuk
requested review from
amoghrajesh,
hussein-awala,
pankajkoti,
ephraimbuddy,
eladkal,
bbovenzi,
pierrejeambrun,
aritra24,
RNHTTR and
pankajastro
and removed request for
ashb,
kaxil,
pierrejeambrun and
jedcunningham
January 13, 2024 21:51
potiuk
force-pushed
the
reproducible-python-client-build
branch
7 times, most recently
from
January 13, 2024 22:40
8d2cf2b
to
bb673a5
Compare
potiuk
added a commit
to apache/airflow-client-python
that referenced
this pull request
Jan 14, 2024
Accompanying apache/airflow#36763 where we fix and modernize the way how Python client gets generated, this on is result of applying the modernisation: * converting project configuration to pyproject.toml and removing all setup.* and requirements files * converting to modern packaging backend (hatchling) to build wheel and sdist packages (with reproducibility) * using hatch test environment definition and coverage to run tests * we deleted release instructions and dev tools (the ones in Airflow will be used to generate and sign the package) * similarly .pre-comit-config.yml is not needed as the client gets generated in the Apache Airflow project. All those are going to be maintained in Apache Airflow project as part of apache/airflow#36763 as a single source of truth - when new Python client gets released the project files will be overwritten from those prepare from Airflow project, this is just initial PR to seed the new
potiuk
added a commit
to apache/airflow-client-python
that referenced
this pull request
Jan 14, 2024
Accompanying apache/airflow#36763 where we fix and modernize the way how Python client gets generated, this on is result of applying the modernisation: * converting project configuration to pyproject.toml and removing all setup.* and requirements files * converting to modern packaging backend (hatchling) to build wheel and sdist packages (with reproducibility) * using hatch test environment definition and coverage to run tests * we deleted release instructions and dev tools (the ones in Airflow will be used to generate and sign the package) * similarly .pre-comit-config.yml is not needed as the client gets generated in the Apache Airflow project. All those are going to be maintained in Apache Airflow project as part of apache/airflow#36763 as a single source of truth - when new Python client gets released the project files will be overwritten from those prepare from Airflow project, this is just initial PR to seed the new
potiuk
added a commit
to potiuk/airflow
that referenced
this pull request
Jan 22, 2024
Following apache#36726, apache#36744, apache#36763, apache#36819 this PR adds the feature of making source tarball that we release as an official release of the ASF for Helm Chart into reproducible tarball. This means that anyone should be able to produce such tarball using the sources of airflow and verify that he tarball pushed to SVN by the release manager is built from our source repositories. We also do the same with Helm package. It turns out that gpg signing of the package does not modify the .tgz file - it just adds .prov file containing checksum and signature, so we can safely re-pack the .tar.gz package in a reproducible way, this way we have both reproduciblity and provenance check nicely working together. There are few changes in this PR that are related: * Bumped Helm version in our environment to use the latest one and using the `breeze k8s setup-env` environment to run all the release commands - this way we can be sure same helm version is used to build the package, further making it more reproducible. * The reproducible packaging utility we have has been refeactored now - we take "source" archive as parameter rather than directory and simply repack it in reproducible way. * The tool also applies group/other ownership removal on its own, because helm package has no option to umask the generated files. * In this change we also ignore subcharts from being exported to the source tarball package as we shoudl not include source files from postgres in our source package.. * Both - the tarball and helm package are generated in `dist` folder similarly as all our other packages. * Documentation for releasing the packages and verifying them is updated. * CI jobs are updated to use the new commands and generated packages are produced as artifacts so that we can be sure the commands continue working and produce the right output.
potiuk
added a commit
that referenced
this pull request
Jan 22, 2024
Following #36726, #36744, #36763, #36819 this PR adds the feature of making source tarball that we release as an official release of the ASF for Helm Chart into reproducible tarball. This means that anyone should be able to produce such tarball using the sources of airflow and verify that he tarball pushed to SVN by the release manager is built from our source repositories. We also do the same with Helm package. It turns out that gpg signing of the package does not modify the .tgz file - it just adds .prov file containing checksum and signature, so we can safely re-pack the .tar.gz package in a reproducible way, this way we have both reproduciblity and provenance check nicely working together. There are few changes in this PR that are related: * Bumped Helm version in our environment to use the latest one and using the `breeze k8s setup-env` environment to run all the release commands - this way we can be sure same helm version is used to build the package, further making it more reproducible. * The reproducible packaging utility we have has been refeactored now - we take "source" archive as parameter rather than directory and simply repack it in reproducible way. * The tool also applies group/other ownership removal on its own, because helm package has no option to umask the generated files. * In this change we also ignore subcharts from being exported to the source tarball package as we shoudl not include source files from postgres in our source package.. * Both - the tarball and helm package are generated in `dist` folder similarly as all our other packages. * Documentation for releasing the packages and verifying them is updated. * CI jobs are updated to use the new commands and generated packages are produced as artifacts so that we can be sure the commands continue working and produce the right output.
potiuk
added a commit
that referenced
this pull request
Feb 7, 2024
Following #36726, #36744, #36763, #36819 this PR adds the feature of making source tarball that we release as an official release of the ASF for Helm Chart into reproducible tarball. This means that anyone should be able to produce such tarball using the sources of airflow and verify that he tarball pushed to SVN by the release manager is built from our source repositories. We also do the same with Helm package. It turns out that gpg signing of the package does not modify the .tgz file - it just adds .prov file containing checksum and signature, so we can safely re-pack the .tar.gz package in a reproducible way, this way we have both reproduciblity and provenance check nicely working together. There are few changes in this PR that are related: * Bumped Helm version in our environment to use the latest one and using the `breeze k8s setup-env` environment to run all the release commands - this way we can be sure same helm version is used to build the package, further making it more reproducible. * The reproducible packaging utility we have has been refeactored now - we take "source" archive as parameter rather than directory and simply repack it in reproducible way. * The tool also applies group/other ownership removal on its own, because helm package has no option to umask the generated files. * In this change we also ignore subcharts from being exported to the source tarball package as we shoudl not include source files from postgres in our source package.. * Both - the tarball and helm package are generated in `dist` folder similarly as all our other packages. * Documentation for releasing the packages and verifying them is updated. * CI jobs are updated to use the new commands and generated packages are produced as artifacts so that we can be sure the commands continue working and produce the right output. (cherry picked from commit 48158c9)
ephraimbuddy
added
the
type:misc/internal
Changelog: Misc changes that should appear in change log
label
Feb 19, 2024
ephraimbuddy
pushed a commit
that referenced
this pull request
Feb 22, 2024
Following #36726, #36744, #36763, #36819 this PR adds the feature of making source tarball that we release as an official release of the ASF for Helm Chart into reproducible tarball. This means that anyone should be able to produce such tarball using the sources of airflow and verify that he tarball pushed to SVN by the release manager is built from our source repositories. We also do the same with Helm package. It turns out that gpg signing of the package does not modify the .tgz file - it just adds .prov file containing checksum and signature, so we can safely re-pack the .tar.gz package in a reproducible way, this way we have both reproduciblity and provenance check nicely working together. There are few changes in this PR that are related: * Bumped Helm version in our environment to use the latest one and using the `breeze k8s setup-env` environment to run all the release commands - this way we can be sure same helm version is used to build the package, further making it more reproducible. * The reproducible packaging utility we have has been refeactored now - we take "source" archive as parameter rather than directory and simply repack it in reproducible way. * The tool also applies group/other ownership removal on its own, because helm package has no option to umask the generated files. * In this change we also ignore subcharts from being exported to the source tarball package as we shoudl not include source files from postgres in our source package.. * Both - the tarball and helm package are generated in `dist` folder similarly as all our other packages. * Documentation for releasing the packages and verifying them is updated. * CI jobs are updated to use the new commands and generated packages are produced as artifacts so that we can be sure the commands continue working and produce the right output. (cherry picked from commit 48158c9)
ephraimbuddy
added
changelog:skip
Changes that should be skipped from the changelog (CI, tests, etc..)
and removed
type:misc/internal
Changelog: Misc changes that should appear in change log
labels
Mar 20, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:API
Airflow's REST/HTTP API
area:dev-tools
changelog:skip
Changes that should be skipped from the changelog (CI, tests, etc..)
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Client source code and package generation was done using the code generated and committed to
airflow-client-python
and while the repository with such code is useful to have, it's just a convenience repo, because all sources are (and should be) generated from the API specification which is present in the Airflow repository.This also made the reproducible builds and package generation not really possible, because we never knew if the source generated in the
airflow-client-python
repository has been generated and not tampered with.While implementing it, it turned out that there were some issues in the past that nade our client generation somewhat broken..
In 2.7.0 python client, we added the same code twice (See Add Client Version 2.7.0 airflow-client-python#93) on top of "airflow_client.client" package, we also added copy of the API client generated in "airflow_client.airflow_client" - that was likely due to bad bash scripts and tools that were used to generate it and errors during generation the clients.
We used to generate the code for "client" package and then moved the "client" package to "airflow_client.client" package, while manually modifying imports with
sed
(!?). That was likely due to limitations in some old version of the client generator. However the client generator we use now is capable of generating code directly in the "airflow_client.client" package.We also manually (via pre-commit) added Apache Licence to the generated files. Whieh was completely unnecessary, because ASF rules do not require licence headers to be added to code automatically generated from a code that already has ASF licence.
We also generated source tarball packages from such generated code, which was completely unnecessary - because sdist packages are already fulfilling all the reqirements of such source pacakges - the code in the packages is enough to build the package from the sources and it does not contain any binary code, moreover the code is generated out of the API specificiation, which means that anyone can take the code and genearate the pacakged software from just sources in sdist. Similarly as in case of provider packages, we do not need to produce separate -source.tar.gz files.
This PR fixes all of it.
First of all the source that lands in the source repository
airflow-client-python
and sdist/wheel packages are generated directly from the openapi specification.They are generated using breeze release_management command from airflow source tagged with specific tag in the Airflow repo (including the source of reproducible build date that is updated together with airflow release notes. This means that any PMC member can regenerate packages (binary identical) straight from the Airflow repository - without going through "airflow-client-python" repository.
No source tarball is generated - it is not needed, sdist is enough.
The
test_python_client.py
has been also moved over to Airflow repoand updated with handling case when expose_config is not enabled and
it is used to automatically test the API client after it has been
generated.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.