Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Arm image building and testing to this repo #6453

Open
antoninbas opened this issue Jun 17, 2024 · 2 comments
Open

Migrate Arm image building and testing to this repo #6453

antoninbas opened this issue Jun 17, 2024 · 2 comments
Assignees
Labels
area/build-release Issues or PRs related to building and releasing priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@antoninbas
Copy link
Contributor

antoninbas commented Jun 17, 2024

Antrea has had support for the arm64 and arm/v7 platforms for a while now. antrea/antrea-agent-ubuntu and antrea/antrea-controller-ubuntu are multi-platform image manifests.

The way the build is currently structured is as follows:

  1. Whenever the main branch is updated, a Github workflow runs and invokes ./hack/build-antrea-linux-all.sh --pull --push-base-images. The workflow then tags and pushes antrea/antrea-agent-ubuntu-amd64 and antrea/antrea-controller-ubuntu-amd64. At this point, the multi-platform manifests have not been updated.
  2. As a final step, the above workflow triggers a separate workflow hosted in a different repository (vmware-tanzu/antrea-build-infra). This repository supports a handful of self-hosted Arm64 runners. The repository is private as a public repo with non-ephemeral self-hosted runners would not be secure (https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/about-self-hosted-runners#self-hosted-runner-security). The runners are graciously provided by OSUOSL which supports multiple open-source projects.
  3. The workflow builds the Agent and Controller Docker images for arm64 (antrea/antrea-agent-ubuntu-arm64, antrea/antrea-controller-ubuntu-arm64:latest) and arm/v7 (antrea/antrea-agent-ubuntu-arm, antrea/antrea-controller-ubuntu-arm:latest). Note that an Arm64 machine can build 32-bit Arm artifacts without emulation. At that point, the multi-platform manifest is created and pushed to the registry. This completes the process of updating antrea/antrea-agent-ubuntu:latest and antrea/antrea-controller-ubuntu:latest.
  4. Finally, a new Github workflow (also in vmware-tanzu/antrea-build-infra) is triggered to test the Arm images, on the same set of self-hosted Arm64 runners.

The same process is used for Antrea tagged releases.

The drawbacks of the current approach are:

  • a more convoluted build
  • relying on a private repository for part of the build infrastructure, which is not transparent
  • self-hosted Arm64 runners are not ephemeral, which tends to create issues over time
  • self-hosted Arm64 runners need to be maintained manually (e.g., upgraded periodically)
  • lack of symmetry between supported platforms
  • bad experience for contributors developing on an Arm machine (e.g., recent macOS laptop): these contributors cannot simply run make after cloning the Antrea repository, as we do not automatically push base images for Arm platforms to the registry. Instead, contributors need to run ./hack/build-antrea-linux-all.sh, at least the first time they build Antrea (and later on, if they want up-to-date base images). There is no registry-based build cache for base images for Arm either, so they have to build everything from scratch (which is not that long).
  • not straightforward to push images to ghcr.io from the private build repository (in case we want to support an additional registry)

The alternative considered when Arm support was introduced was to use QEMU emulation to build the multi-platform images. This would only be practical if we built OVS, and potentially the Antrea Go binaries, without emulation, using cross-compilation support from the C and Go compilers. Otherwise, building OVS for Arm using QEMU would take way too much time. This would require making the build (Dockerfiles) more complex and harder to maintain. Even then, the build would be slow, as other things such as installing system packages / dependencies could take a while. As for testing, emulation is just not practical.

Even today, emulation is unlikely to be a good option for us. But recently there has been some interesting developments, with the availability (or upcoming availability) of hosted native Arm64 runners for Github workflows:

  1. the CNCF has its own a program to make Arm64 runners available to CNCF projects: https://actuated.dev/blog/arm-ci-cncf-ampere
  2. Github has just announced that hosted Arm64 runners are in Beta for Enterprise accounts: https://github.blog/changelog/2024-06-03-actions-arm-based-linux-and-windows-runners-are-now-in-public-beta/. IIRC the CNCF uses an Enterprise account.

Using one of these options, we would no longer need to manage self-hosted Arm64 runners. We could also move all of the build infrastructure to this repository, and remove the dependency on vmware-tanzu/antrea-build-infra (at least for building, we may initially want to keep testing the Arm-based Antrea images using the OSUOSL machines, to keep our Github runners usage low).

I am currently asking the CNCF if option 2 (Github-hosted Arm64 runners) is available for CNCF projects. I will update this issue once I find out.
Edit: according to CNCF staff, this is already enabled and available to all projects under the CNCF Github Enterprise account, so option 2 is something we could pursue right away. I have not tested it yet.

@antoninbas antoninbas added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. area/build-release Issues or PRs related to building and releasing labels Jun 17, 2024
@antoninbas antoninbas self-assigned this Jun 24, 2024
antoninbas added a commit to antoninbas/antrea that referenced this issue Jun 25, 2024
Github-hosted Arm runners are now in Beta for Enterprise accounts, and
available to all CNCF projects. We can use them to build Antrea Arm
images for the Agent and Controller, instead of relying on a private
Github repo with self-hosted Arm runners.

At the moment, we only migrate the building part (along with creation of
the multi-image manifest), and we use the existing workflow in
vmware-tanzu/antrea-build-infra for "asynchronous" testing of the Arm
images. We will handle the migration of the testing part in the future.

For antrea-io#6453

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Jun 25, 2024
Github-hosted Arm runners are now in Beta for Enterprise accounts, and
available to all CNCF projects. We can use them to build Antrea Arm
images for the Agent and Controller, instead of relying on a private
Github repo with self-hosted Arm runners.

At the moment, we only migrate the building part (along with creation of
the multi-image manifest), and we use the existing workflow in
vmware-tanzu/antrea-build-infra for "asynchronous" testing of the Arm
images. We will handle the migration of the testing part in the future.

For antrea-io#6453

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Jun 25, 2024
Github-hosted Arm runners are now in Beta for Enterprise accounts, and
available to all CNCF projects. We can use them to build Antrea Arm
images for the Agent and Controller, instead of relying on a private
Github repo with self-hosted Arm runners.

At the moment, we only migrate the building part (along with creation of
the multi-image manifest), and we use the existing workflow in
vmware-tanzu/antrea-build-infra for "asynchronous" testing of the Arm
images. We will handle the migration of the testing part in the future.

For antrea-io#6453

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
@antoninbas
Copy link
Contributor Author

I have been experimenting with the Github-hosted arm runners provided by the CNCF. At the moment, I am running into an issue where I cannot get the arm/v7 version of the Docker images to build on the arm runners, which use the aarch64 architecture. Most aarch64 CPUs which use the Armv8-A architecture are compatible with 32-bit arm/v7 binaries, and we actually leverage this in our current setup which uses self-hosted aarch64 runners. However, with the Github-hosted runners (which also use the Ampere platform), I keep getting the following error when building the antrea-openvswitch image:

2024-06-27T21:41:51.5264487Z #5 [context ubuntu] ubuntu:24.04
2024-06-27T21:41:51.5267799Z #5 sha256:aa9f84d6e529483956a3454f02193e9a0f758a08b5191f2199c928065d307720 8.39MB / 26.82MB 0.2s
2024-06-27T21:41:51.6375370Z #5 sha256:aa9f84d6e529483956a3454f02193e9a0f758a08b5191f2199c928065d307720 26.82MB / 26.82MB 0.3s done
2024-06-27T21:41:51.7904855Z #5 extracting sha256:aa9f84d6e529483956a3454f02193e9a0f758a08b5191f2199c928065d307720
2024-06-27T21:41:52.1948378Z #5 extracting sha256:aa9f84d6e529483956a3454f02193e9a0f758a08b5191f2199c928065d307720 0.6s done
2024-06-27T21:41:52.1949794Z #5 DONE 0.9s
2024-06-27T21:41:52.3411096Z 
2024-06-27T21:41:52.3414037Z #7 [ovs-debs 1/6] RUN echo "xyz"
2024-06-27T21:41:52.3421499Z #7 0.034 xyz
2024-06-27T21:41:52.3421990Z #7 DONE 0.1s
2024-06-27T21:41:52.3422245Z 
2024-06-27T21:41:52.3422648Z #8 [ovs-debs 2/6] RUN apt-get update
2024-06-27T21:41:52.3423379Z #8 0.060 The futex facility returned an unexpected error code.
2024-06-27T21:42:01.5620525Z #8 9.281 Aborted (core dumped)
2024-06-27T21:42:01.5842816Z #8 ERROR: process "/bin/sh -c apt-get update" did not complete successfully: exit code: 134
2024-06-27T21:42:01.5843829Z ------
2024-06-27T21:42:01.5844339Z  > [ovs-debs 2/6] RUN apt-get update:
2024-06-27T21:42:01.5845044Z 0.060 The futex facility returned an unexpected error code.
2024-06-27T21:42:01.5845751Z 9.281 Aborted (core dumped)

The echo command was added by me to the Dockerfile, to show that commands can run successfully. But running apt fails immediately with what I think is a libc error. I tried with both ubuntu:22.04 and ubuntu:24.04 base images.

We can wait a bit and see if the issue gets resolved as software is updated on the runners. We could also try qemu emulation to build the arm/v7 images, and see if it is fast (enough) on aarch64.

antoninbas added a commit to antoninbas/antrea that referenced this issue Sep 6, 2024
Github-hosted Arm runners are now in Beta for Enterprise accounts, and
available to all CNCF projects. We can use them to build Antrea Arm
images for the Agent and Controller, instead of relying on a private
Github repo with self-hosted Arm runners.

At the moment, we only migrate the building part (along with creation of
the multi-image manifest), and we use the existing workflow in
vmware-tanzu/antrea-build-infra for "asynchronous" testing of the Arm
images. We will handle the migration of the testing part in the future.

For antrea-io#6453

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Sep 9, 2024
Github-hosted Arm runners are now in Beta for Enterprise accounts, and
available to all CNCF projects. We can use them to build Antrea Arm
images for the Agent and Controller, instead of relying on a private
Github repo with self-hosted Arm runners.

At the moment, we only migrate the building part (along with creation of
the multi-image manifest), and we use the existing workflow in
vmware-tanzu/antrea-build-infra for "asynchronous" testing of the Arm
images. We will handle the migration of the testing part in the future.

For antrea-io#6453

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Sep 9, 2024
Github-hosted Arm runners are now in Beta for Enterprise accounts, and
available to all CNCF projects. We can use them to build Antrea Arm
images for the Agent and Controller, instead of relying on a private
Github repo with self-hosted Arm runners.

At the moment, we only migrate the building part (along with creation of
the multi-image manifest), and we use the existing workflow in
vmware-tanzu/antrea-build-infra for "asynchronous" testing of the Arm
images. We will handle the migration of the testing part in the future.

For antrea-io#6453

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
@antoninbas
Copy link
Contributor Author

The issue described in #6453 (comment) appears to have been resolved. The arm/v7 image can build successfully on the arm64 workers provided by the CNCF. I can probably make progress on this issue now.

antoninbas added a commit to antoninbas/antrea that referenced this issue Sep 17, 2024
Github-hosted Arm runners are now in Beta for Enterprise accounts, and
available to all CNCF projects. We can use them to build Antrea Arm
images for the Agent and Controller, instead of relying on a private
Github repo with self-hosted Arm runners.

At the moment, we only migrate the building part (along with creation of
the multi-image manifest), and we use the existing workflow in
vmware-tanzu/antrea-build-infra for "asynchronous" testing of the Arm
images. We will handle the migration of the testing part in the future.

As part of this change, we also push "base images" (antrea/openvswitch,
antrea/base-ubuntu) for arm64 and arm/v7 to the registry. This is
necessary for building the Antrea images with the Docker container build
driver. The base images now have the architecture as a suffix in their
names. They are not available as multi-platform image manifests.

For antrea-io#6453

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Sep 17, 2024
Github-hosted Arm runners are now in Beta for Enterprise accounts, and
available to all CNCF projects. We can use them to build Antrea Arm
images for the Agent and Controller, instead of relying on a private
Github repo with self-hosted Arm runners.

At the moment, we only migrate the building part (along with creation of
the multi-image manifest), and we use the existing workflow in
vmware-tanzu/antrea-build-infra for "asynchronous" testing of the Arm
images. We will handle the migration of the testing part in the future.

As part of this change, we also push "base images" (antrea/openvswitch,
antrea/base-ubuntu) for arm64 and arm/v7 to the registry. This is
necessary for building the Antrea images with the Docker container build
driver. The base images now have the architecture as a suffix in their
names. They are not available as multi-platform image manifests.

For antrea-io#6453

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Oct 14, 2024
Github-hosted Arm runners are now in Beta for Enterprise accounts, and
available to all CNCF projects. We can use them to build Antrea Arm
images for the Agent and Controller, instead of relying on a private
Github repo with self-hosted Arm runners.

At the moment, we only migrate the building part (along with creation of
the multi-image manifest), and we use the existing workflow in
vmware-tanzu/antrea-build-infra for "asynchronous" testing of the Arm
images. We will handle the migration of the testing part in the future.

As part of this change, we also push "base images" (antrea/openvswitch,
antrea/base-ubuntu) for arm64 and arm/v7 to the registry. This is
necessary for building the Antrea images with the Docker container build
driver. The base images now have the architecture as a suffix in their
names. They are not available as multi-platform image manifests.

For antrea-io#6453

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build-release Issues or PRs related to building and releasing priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

1 participant