Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow building ml_metadata_store_server image on ARM64 #188

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

thesuperzapper
Copy link

@thesuperzapper thesuperzapper commented Dec 12, 2023

This PR updates the Dockerfile and other things to allow the gcr.io/tfx-oss-public/ml_metadata_store_server to be built for a linux/arm64 target.

This is important for Kubeflow Pipelines, and thus deployKF (a tool for deploying Kubeflow).

The changes are:

  • Migrate ml_metadata/tools/docker_server/Dockerfile to use Bazelisk (this was the easiest way to download the ARM64 version of Bazel 5.3.0)
  • Creates a new .bazelversion in the repo root (Bazelisk expects it to be there)
  • Removes the license header from ./ml_metadata/.bazelversion (this breaks Bazelisk)
  • Makes ml_metadata/postgresql.BUILD directly the same as upstream tensorflow/io commit a1171cdd20e658ef3f1a8b3bf66dd4e228ceae30 (to remove X86-specific stuff)
    • NOTE: I am not 100% sure about the other changes that came with this, but it was required to get the ARM64 build to succeed. Please run tests to ensure no regressions before merging.

Next Steps:

  • Please review to confirm that updating ml_metadata/postgresql.BUILD did not cause a regression.
  • Update the build process to publish both amd64 AND arm64 versions of gcr.io/tfx-oss-public/ml_metadata_store_server

@thesuperzapper
Copy link
Author

@zijianjoy @chensun you might be interested in this, as gcr.io/tfx-oss-public/ml_metadata_store_server is the last Kubeflow container which is not yet published for ARM.

I am not 100% confident in this PR, so would appreciate testing/review from your end.

@thesuperzapper
Copy link
Author

For those who want to test, I have made a forked repo in the deployKF org with the ARM versions of the gcr.io/tfx-oss-public/ml_metadata_store_server image. You can test a patched version of ml-metdata version 1.14.0 by using the following container:

Note, building under emulation on GitHub actions took about 5 hours:

@tarilabs
Copy link
Contributor

fwiw I'm on M2 Mac, and I was able to reproduce the docker image build while checking out this PR and issuing command: ./ml_metadata/tools/docker_server/build_docker_image.sh

Screenshot 2023-12-17 at 18 26 29

Seems to me completing successfully the docker image build and It results in a server image for arm:

Screenshot 2023-12-17 at 18 26 45

Hope this helps (planning it testing out this image locally over the next few days too, will report back any findings in case, but I wanted to thank for these effort by making a local run and reporting back)

@thesuperzapper
Copy link
Author

@XinranTang @ml-metadata-team @tarilabs just wondering if I should rebase this (since the bazel 6.1.0 update created merge conflicts) so we can get this building for ARM merged?

@thesuperzapper thesuperzapper force-pushed the allow-building-server-image-with-arm64 branch from 5f3a351 to 58aa796 Compare March 22, 2024 22:43
@thesuperzapper
Copy link
Author

I rebased it.

Note, this only fixes the building of gcr.io/tfx-oss-public/ml_metadata_store_server on ARM, it does not fix installing the ml_metadata python package on ARM (Linux or macOS).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants