-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-47582: [CI][Packaging] Move linux-packaging tasks to apache/arrow repository #47600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kou I've started working on this and I think I have a working base that could work. I wanted to discuss it with you before moving more packages to validate we are all ok with the approach.
Let me know your thoughts. I have a couple of questions to discuss too, let me know what you think on those too.
.github/workflows/cpp_extra.yml
Outdated
uses: ./.github/workflows/check_labels.yml | ||
secrets: inherit | ||
with: | ||
label: "CI: Extra" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to move these labels to either:
CI Extra: cpp
CI Extra: CPP
CI: cpp
CI: CPP
So we can have the others the same like:
CI Extra: linux-packaging
CI Extra: Linux Packaging
CI: linux-packaging
CI: Linux Packaging
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like CI: Extra: C++
/CI: Extra: Package: Linux
or CI: cpp
/CI: package: linux
/CI: package: linux: apt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
.github/workflows/cpp_extra.yml
Outdated
uses: ./.github/workflows/check_labels.yml | ||
secrets: inherit | ||
with: | ||
label: "CI: Extra" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like CI: Extra: C++
/CI: Extra: Package: Linux
or CI: cpp
/CI: package: linux
/CI: package: linux: apt
.
1121d15
to
78e0ff2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone over all the comments and I think this is ready for another review.
# apache/arrow-adbc uses debian-bookworm. So the following | ||
# glob must much both of them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is adbc still using this binary tasks? in that case we probably want to maintain the previous behaviour for the task too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
But I can change ADBC to simplify this script.
dev/release/binary-task.rb
Outdated
# glob must much both of them. | ||
Dir.glob("#{source_dir_prefix}*/*") do |path| | ||
base_name = File.basename(path) | ||
destination_prefix = "#{incoming_dir}/#{distribution}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kou I've never done anything on ruby (I know, I should learn a little 😃 ). I've tested these tasks locally (apt:rc:copy
and yum:rc:copy
) with a small integration test calling rake directly and seems to be working as expected but I would appreciate some help on those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing Ruby!
Sure. I can help. Can I push something to this branch if it's needed?
BTW, destination_dir
is better for this variable name because this is a directory name not a prefix of path. (In this code, "prefix" is used for abc
for abcde
directory and "dir" is used for abcde
directory.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can push to this branch if needed! that's completely fine with me. I am trying to simplify the folder structure as you suggested so the binary tasks are even simpler. Thanks, it's good that I am learning a little bit more about our linux packages so I can help you maintain those in the future.
# apache/arrow-adbc uses debian-bookworm. So the following | ||
# glob must much both of them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
But I can change ADBC to simplify this script.
.github/workflows/package_linux.yml
Outdated
run: | | ||
set -ex | ||
pushd dev/tasks/linux-packages | ||
tar cvzf ${{ matrix.id }}.tar.gz */${TASK_NAMESPACE}/repositories |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's simplify paths more to simplify binary-task.rb
:
tar cvzf ${{ matrix.id }}.tar.gz */${TASK_NAMESPACE}/repositories | |
mkdir -p artifacts | |
cp -a */${TASK_NAMESPACE}/repositories/* artifacts/ | |
tar cvzf ${{ matrix.id }}.tar.gz -C artifacts . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to simplify the paths slightly more but I am finding quite difficult to understand how all tangles together. How do we generate the repositories from those rpms, how we merge these files with other existing files from the repo, how do we create the testing repositories, etcetera so I am reverting my latest changes. I am happy to leave it like this (current generated structure) and try to simplify on a future PR or if you want to guide me on how to do it or if you want to push to this branch.
dev/release/04-binary-download.sh
Outdated
|
||
# Wait for the GitHub Workflow that creates the Linux packages | ||
# to finish before downloading the artifacts. | ||
. "${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${release_tag}" "package_linux.yml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need .
here because all information is passed via arguments:
. "${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${release_tag}" "package_linux.yml" | |
"${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${release_tag}" "package_linux.yml" |
dev/release/binary-task.rb
Outdated
# glob must much both of them. | ||
Dir.glob("#{source_dir_prefix}*/*") do |path| | ||
base_name = File.basename(path) | ||
destination_prefix = "#{incoming_dir}/#{distribution}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing Ruby!
Sure. I can help. Can I push something to this branch if it's needed?
BTW, destination_dir
is better for this variable name because this is a directory name not a prefix of path. (In this code, "prefix" is used for abc
for abcde
directory and "dir" is used for abcde
directory.)
# Wait for the GitHub Workflow that creates the GitHub Release | ||
# to finish before updating the release notes. | ||
"${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${TAG}" "${WORKFLOW}" | ||
. "${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${TAG}" "${WORKFLOW}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need .
here.
download_artifacts() { | ||
RUN_ID=$(get_run_id) | ||
echo "Downloading artitfacts for workflow with ID: ${RUN_ID}" | ||
gh run download \ | ||
${RUN_ID} \ | ||
--repo "${REPOSITORY}" \ | ||
--dir "$1" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Defining this function in this file isn't good because this file name is watch-gh-workflow
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could rename the file to utils-gh-workflow.sh
but if we use release for the artifacts this won't be necessary.
dev/release/04-binary-download.sh
Outdated
# to finish before downloading the artifacts. | ||
. "${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${release_tag}" "package_linux.yml" | ||
|
||
RUN_ID=$(get_run_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you want to use a function defined in utils-watch-gh-workflow.sh
.
dev/release/04-binary-download.sh
Outdated
|
||
RUN_ID=$(get_run_id) | ||
# Download the artifacts created by the package_linux.yml workflow | ||
download_artifacts "${SOURCE_DIR}/../../packages/${CROSSBOW_JOB_ID}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using GitHub Release for RC instead of GitHub Actions artifacts? If we use GitHub Release, we don't need RUN_ID
: gh release download apache-arrow-${VERSION}-rc${RC} ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you mean, for scheduled or PR/push use artifacts but if workflow_dispatch
was used, which is trigger of RC, upload to release? I don't want to create a release every time the job is executed but that could work. In that case we might use artifacts to test and upload to release if tests are successful and job type was workflow_dispatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to upload artifacts to GitHub Release only when apache-arrow-${VERSION}-rc${RC}
tag is pushed. Can we use on.push.tags
instead of workflow_dispatch
for RC? (I can provide changes for the approach.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed changes for the suggested workflow.
I'll fix CI failures and then test release workflow by pushing a RC tag to fork repository.
def detect_repo | ||
detect_env("REPO") | ||
def github_repository | ||
ENV["GITHUB_REPOSITORY"] || "apache/arrow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub Actions provides GITHUB_REPOSITORY
environment variable by default. So this uses fork's GitHub Container registry on fork automatically.
repository="${repositories}/${distribution}/${distribution_version}" | ||
rpm_dir="${repository}/${architecture}/Packages" | ||
srpm_dir="${repository}/source/SRPMS" | ||
srpm_dir="${repository}/Source/Packages" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AlmaLinux use Source/Packages
: https://vault.almalinux.org/10.0/BaseOS/Source/Packages/
["almalinux", "8"], | ||
["amazon-linux", "2023"], | ||
["centos", "9-stream"], | ||
["centos", "8-stream"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already dropped support for CentOS Stream 8: #46953
Dir.glob("#{source_dir_prefix}*.tar.gz") do |tar_gz| | ||
sh("tar", "xf", tar_gz, "-C", incoming_dir) | ||
progress_reporter.advance | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use this for ADBC too but we need some changes in ADBC: apache/arrow-adbc#3510
# Trigger workflow when a tag whose name matches the pattern | ||
# "apache-arrow-{MAJOR}.{MINOR}.{PATCH}-rc{RC_NUM}" is pushed. | ||
- "apache-arrow-[0-9]+.[0-9]+.[0-9]+-rc[0-9]+" | ||
- "apache-arrow-*-rc*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify and unify tag patterns.
# We use latest .deb/.rpm of | ||
# apache-arrow-apt-source/apache-arrow-release built for | ||
# amd64 because they are architecture independent. | ||
if [ "${ARCHITECTURE}" = "amd64" ]; then | ||
if [ "${APT_TASK_NAMESPACE}" = "apt" ]; then | ||
# Example: debian-bookworm-amd64 -> debian-bookworm | ||
code_name="${ID%-*}" | ||
# Example: debian-bookworm -> bookworm | ||
code_name="${code_name#*-}" | ||
# Create | ||
# https://packages.apache.org/artifactory/arrow/${DISTRIBUTION}/apache-arrow-apt-source-latest-${code_name}.deb | ||
# for easy to install. | ||
cp -a \ | ||
"${DISTRIBUTION}/pool/${code_name}/a/apache-arrow-apt-source/*.deb \ | ||
"${DISTRIBUTION}/apache-arrow-apt-source-latest-${code_name}.deb" | ||
else | ||
# Example: amazon-linux-2023-amd64 -> amazon-linux-2023 | ||
version="${ID%-*}" | ||
# Example: amazon-linux-2023 -> 2023 | ||
version="${version##*-}" | ||
# Create | ||
# https://packages.apache.org/artifactory/arrow/${DISTRIBUTION}/${version}/apache-arrow-release-latest.rpm | ||
# for easy to install. | ||
cp -a \ | ||
"${DISTRIBUTION}/${version}/x86_64/Packages/apache-arrow-release-*.rpm \ | ||
"${DISTRIBUTION}/${version}/apache-arrow-release-latest.rpm" | ||
fi | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moved from binary-task.rb
.
We'll create latest .deb
/.rpm
here to simplify binary-task.rb
.
if [ "${GITHUB_REF_TYPE}" = "tag" ]; then | ||
# Example: apache-arrow-21.0.0-rc0 -> 21.0.0-rc0 | ||
version="${GITHUB_REF_NAME#apache-arrow-}" | ||
echo "ARROW_VERSION=${version}" >> "${GITHUB_ENV}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use 21.0.0-rc0
not 21.0.0
here. The -rc0
part will be removed by
arrow/dev/tasks/linux-packages/package-task.rb
Lines 37 to 45 in 16ceade
when /-((dev|rc)\d+)\z/ | |
base_version = $PREMATCH | |
sub_version = $1 | |
type = $2 | |
if type == "rc" and options[:rc_build_type] == :release | |
@deb_upstream_version = base_version | |
@deb_archive_base_name_version = base_version | |
@rpm_version = base_version | |
@rpm_release = "1" |
TBD
Rationale for this change
What changes are included in this PR?
check_labels
andreport_ci
jobs to independent reusable workflows.cpp_extra
to use those.linux_packaging.yml
workflow replicating work that was done on crossbow. Integrate that workflow withcheck_labels
andreport_ci
Are these changes tested?
Some via CI on fork and some manual testing.
Are there any user-facing changes?
No