Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build and publish ml_metadata_store_server container image for ARM64 #10308

Open
thesuperzapper opened this issue Dec 12, 2023 · 12 comments
Open

Comments

@thesuperzapper
Copy link
Member

thesuperzapper commented Dec 12, 2023

Description

Right now, the gcr.io/tfx-oss-public/ml_metadata_store_server container image is the only image used in Kubeflow which is not published for both amd64 AND arm64. This means that Kubeflow 1.8 still can not properly run on ARM clusters.

I have made a PR upstream in google/ml-metadata to get the builds working for ARM64:

We need to work with the ml-metadata team to review/merge it and then set up a process to also push the ARM version of that image to GCR.

EDIT: I was incorrect about this being the "only one" but I think this must be the only one that does not work at all under Rosetta emulation (but either way, we need to fix this one too as we also push native arm images for the others). I have raised a separate issue to track fixing the other images:


Love this idea? Give it a 👍.

@thesuperzapper
Copy link
Member Author

/cc @chensun @zijianjoy

@thesuperzapper
Copy link
Member Author

thesuperzapper commented Dec 13, 2023

For those who want to test, I have made a forked repo in the deployKF org with the ARM versions of the gcr.io/tfx-oss-public/ml_metadata_store_server image. You can test a patched version of ml-metdata version 1.14.0 by using the following container:

Note, building under emulation on GitHub actions took about 5 hours:

@xixici
Copy link

xixici commented Dec 13, 2023

Great. I pull this image and run it correctly. Then, I am finding gcr.io/ml-pipeline/metadata-writer and gcr.io/ml-pipeline/metadata-envoy with ARM version.

@thesuperzapper
Copy link
Member Author

@xixici can you confirm what you are saying?

Because gcr.io/ml-pipeline/metadata-writer:2.0.5 and gcr.io/ml-pipeline/metadata-envoy:2.0.5 (and all other versions) are only published for ADM64.

I assume you mean that they work via Rosetta Emulation on a MacBook?

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Mar 13, 2024
Copy link

github-actions bot commented Apr 3, 2024

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@github-actions github-actions bot closed this as completed Apr 3, 2024
@thesuperzapper
Copy link
Member Author

/reopen

Copy link

@thesuperzapper: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@google-oss-prow google-oss-prow bot reopened this Apr 5, 2024
@thesuperzapper
Copy link
Member Author

Clearly, this is going to take some time, so I will prevent the bot from closing it.

/lifecycle frozen

@google-oss-prow google-oss-prow bot added lifecycle/frozen and removed lifecycle/stale The issue / pull request is stale, any activities remove this label. labels Apr 5, 2024
@MouseSun846
Copy link

I have successfully completed metadata envoy 2.0.5 and built it in an ARM environment

The following are the construction steps:

docker pull --platform linux/arm64 envoyproxy/envoy:v1.16.0

In the directory of pipelines/third_party/metadata_envoy

1、modify Dockerfile and config proxy info

FROM envoyproxy/envoy:v1.16.0

RUN apt-get -o Acquire::http::proxy="http://proxy:port" update -y && \
  apt-get -o Acquire::http::proxy="http://proxy:port" install --no-install-recommends -y -q gettext openssl

COPY third_party/metadata_envoy/envoy.yaml /etc/envoy.yaml

# Copy license files.
#RUN mkdir -p /third_party
COPY third_party/metadata_envoy/license.txt /third_party/license.txt

ENTRYPOINT ["/usr/local/bin/envoy", "-c"]
CMD ["/etc/envoy.yaml"]

2、modify envoy.yaml

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address: { address: 0.0.0.0, port_value: 9901 }

static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address: { address: 0.0.0.0, port_value: 9090 }
      filter_chains:
        - filters:
            - name: envoy.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
                codec_type: auto
                stat_prefix: ingress_http
                route_config:
                  name: local_route
                  virtual_hosts:
                    - name: local_service
                      domains: ["*"]
                      routes:
                        - match: { prefix: "/" }
                          route:
                            cluster: metadata-cluster
                            max_grpc_timeout: 0s
                      cors:
                        allow_origin_string_match:
                          - exact: "*"
                        allow_methods: GET, PUT, DELETE, POST, OPTIONS
                        allow_headers: keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,custom-header-1,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout
                        max_age: "1728000"
                        expose_headers: custom-header-1,grpc-status,grpc-message
                http_filters:
                  - name: envoy.grpc_web
                  - name: envoy.cors
                  - name: envoy.router
  clusters:
    - name: metadata-cluster
      connect_timeout: 30.0s
      type: logical_dns
      http2_protocol_options: {}
      lb_policy: round_robin
      load_assignment:
        cluster_name: metadata-cluster
        endpoints:
        - lb_endpoints:
          - endpoint:
              address:
                socket_address:
                  address: metadata-grpc-service
                  port_value: 8080   

3、finally,build image
In the directory of pipelines/backend/Makefile

.PHONY: metadata_envoy
metadata_envoy:
	cd $(MOD_ROOT) && docker build -t registry.cnbita.com:5000/kubeflow-pipelines/metadata-envoy:2.0.5-arm  -f third_party/metadata_envoy/Dockerfile .

run make metadata_envoy

@MouseSun846
Copy link

Clearly, this is going to take some time, so I will prevent the bot from closing it.

/lifecycle frozen

#10308 (comment)

@MouseSun846
Copy link

It seems that kubeflow v2 does not use the metadata writer component. My cluster has not installed it, but I can still use the pipeline normally

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs triage
Development

No branches or pull requests

3 participants