feat: Migrate JWA test-related workflows from kubeflow/kubeflow to notebooks-v1 branch #587 #599

yehudit1987 · 2025-09-15T06:14:48Z

feat: migrate Jupyter Web App tests to notebooks repo

Add testing infrastructure scripts (kind, kustomize, istio)
Update test workflows to use ghcr.io/kubeflow/notebooks/jupyter-web-app
Update JWA component manifests and Makefile for new registry

Enables JWA tests to run in kubeflow/notebooks repository.

Solves issue #587

google-oss-prow · 2025-09-15T06:14:54Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kimwnasptd for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…tebooks-v1 branch kubeflow#587 Signed-off-by: Yehudit Kerido <yehudit1987@gmail.com>

yehudit1987 · 2025-09-18T13:36:57Z

Discussion:
I noticed some inconsistencies with the ARM64 architecture. The build process sometimes hangs for a long time—it looks like the image gets created (most of the time), but the step never finishes.

I tried switching from --load to packing into a tar file, and that helped for a while, but the issue came back. After digging into it, it seems the architecture needs a lot of resources because of the emulation process, and it might be getting stuck while shutting down QEMU.

Proposed solution:
Since the problem happens when "closing the step," I’ve added a one-hour timeout (with about a 30-minute buffer). This way, it won’t run endlessly for six hours, and hitting the timeout won’t be treated as a failure.

The next step just shows the created images—we still need to decide whether we want the build to fail if the ARM64 step does.

andyatmiami

Thank you for the contribution and getting the release process started for notebooks-v1!

I think we still need some work on this PR - primarily around communication on why some of these changes are necessary and deviate from how kubeflow/kubeflow was set up.

I am not implying (necessarily) that they are wrong - only that there is no explanation on this PR to be able to understand why the proposed changes were required.

andyatmiami · 2025-09-19T18:02:27Z

components/crud-web-apps/jupyter/Makefile

 docker-build-multi-arch: ##  Build multi-arch docker images with docker buildx
 	cd ../ && docker buildx build --load --platform ${ARCH} --tag ${IMG}:${TAG} -f ${DOCKERFILE} .

+.PHONY: docker-build-multi-arch-verify


can you explain why we needed to add this Makefile target? Why was this not needed in kubeflow/kubeflow but is now needed in kubeflow/notebooks?

This is the kind of thing I would have expected to be called out in the commit message and/or PR description - but I don't see any reference to it...

Tagging each architecture separately would be a breaking change for customers - and I don't think "sneaking that in" in is something I could "sign off on" without discussion from community.

Ok so as I mention, there where issues with ARM64 and I did of course open my proposal to discussion.

This target was added only for this scenario (and not instead of) and it helps to validate what is happening with the problematic image.
Now as I mention, most of the time (if not all) I can't fail the build step since not sure the image build is failing, it just get stuck.

I'm not sue why it's happens here and not in kubeflow/kubeflow but from what I have read it might be related to differences in runners. Maybe the experts from the community can put some light on this issue.

Yep, apologies - I had missed your discussion comment on the PR!

andyatmiami · 2025-09-19T18:09:21Z

.github/workflows/jwa_multi_arch_test.yaml

+    - name: Verify multi-arch builds
+      run: |
+        echo "Verifying multi-arch build results:"
+        echo "Listing all jupyter-web-app images:"
+        docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}\t{{.CreatedAt}}" | grep jupyter-web-app || echo "No jupyter-web-app images found"
+        
+        echo ""
+        echo "Checking individual architectures:"
+        TAG=$(git describe --tags --always --dirty)
+        echo "Base TAG: $TAG"
+        
+        # Check each architecture
+        if docker images | grep -q "jupyter-web-app.*${TAG}-linux-amd64"; then
+          echo "✅ AMD64: SUCCESS"
+        else
+          echo "❌ AMD64: FAILED"
+        fi
+        
+        if docker images | grep -q "jupyter-web-app.*${TAG}-linux-ppc64le"; then
+          echo "✅ PPC64LE: SUCCESS"
+        else
+          echo "❌ PPC64LE: FAILED"
+        fi
+        
+        if docker images | grep -q "jupyter-web-app.*${TAG}-linux-arm64-v8"; then
+          echo "✅ ARM64: SUCCESS"
+        else
+          echo "ℹ️  ARM64: Not built (expected if timeout occurred)"
+        fi


Related to my comment on https://github.com/kubeflow/notebooks/pull/599/files#diff-07cbcedb761f69b198691b554bea634acaf78ebc7005415b45a3fe51033cc561...

I am not sure why this Verify command is necessary... are you observing cases where the build step does NOT load an image - and yet the GitHub Action workflow continues on? That would be the only situation in which I would think we need this explicit "Verify" step - and yet I would expec the build step to fail (and likewise the workflow to error).

I need more details around necessity of this logic before I could sign off on these changes.

I'm unclear why this Verify target is necessary (?)

Yes, exactly. I'm observing cases where image not loaded (not sure if they will pop since the step is hang). As mentioned before I can't decide on failure in the build phase. In this PR my initial suggestion is to wrap it with timeout and then run some validations, of course in that phase we can fail the build if needed.

andyatmiami · 2025-09-19T18:11:26Z

.github/workflows/jwa_multi_arch_test.yaml

+    - name: Build ARM64 Image (Optional)
+      timeout-minutes: 60
+      continue-on-error: true
+      run: |
+        cd components/crud-web-apps/jupyter
+        ARCH=linux/arm64/v8 make docker-build-multi-arch-verify
+        echo "Completed linux/arm64/v8 build!"


This was not an optional step in the original kubeflow/kubeflow JWA Multi-Arch Build Test workflow.. Would need you to add additional context here to judge if this is in fact a warranted/necessary change.

So as mentioned before, from my investigation it might be a runner issue but I'm not sure.

andyatmiami · 2025-09-19T18:12:51Z

.github/workflows/jwa_multi_arch_test.yaml

+      - components/crud-web-apps/common/**
+      - releasing/version/VERSION
+    branches:
+      - master


kubeflow/notebooks does NOT have a master branch... I would assume this is intended to read main (?)

Absolutely correct, my bad. Will fix it in the next patch after we will resolve some of those discussions above.

github-project-automation bot added this to Kubeflow Notebooks Sep 15, 2025

github-project-automation bot moved this to Needs Triage in Kubeflow Notebooks Sep 15, 2025

yehudit1987 marked this pull request as draft September 15, 2025 06:14

google-oss-prow bot requested review from kimwnasptd and thesuperzapper September 15, 2025 06:14

google-oss-prow bot added do-not-merge/work-in-progress size/L labels Sep 15, 2025

yehudit1987 force-pushed the feat/migrate_jwa_tests branch 9 times, most recently from 0acf6b9 to 1c8c9ac Compare September 17, 2025 09:03

google-oss-prow bot added area/backend area - related to backend components area/ci area - related to ci area/frontend area - related to frontend components area/v1 area - version - kubeflow notebooks v1 labels Sep 17, 2025

yehudit1987 force-pushed the feat/migrate_jwa_tests branch 9 times, most recently from 4a6d212 to 0ac0946 Compare September 18, 2025 11:29

yehudit1987 force-pushed the feat/migrate_jwa_tests branch from 0ac0946 to 8b9c3cc Compare September 18, 2025 11:39

feat: Migrate JWA test-related workflows from kubeflow/kubeflow to no…

2c5e96d

…tebooks-v1 branch kubeflow#587 Signed-off-by: Yehudit Kerido <yehudit1987@gmail.com>

yehudit1987 force-pushed the feat/migrate_jwa_tests branch from 8b9c3cc to 2c5e96d Compare September 18, 2025 13:07

yehudit1987 marked this pull request as ready for review September 18, 2025 14:28

google-oss-prow bot removed the do-not-merge/work-in-progress label Sep 18, 2025

andyatmiami suggested changes Sep 19, 2025

View reviewed changes

google-oss-prow bot assigned andyatmiami Sep 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Migrate JWA test-related workflows from kubeflow/kubeflow to notebooks-v1 branch #587 #599

feat: Migrate JWA test-related workflows from kubeflow/kubeflow to notebooks-v1 branch #587 #599

Uh oh!

yehudit1987 commented Sep 15, 2025

Uh oh!

google-oss-prow bot commented Sep 15, 2025

Uh oh!

yehudit1987 commented Sep 18, 2025 •

edited

Loading

Uh oh!

andyatmiami left a comment

Uh oh!

andyatmiami Sep 19, 2025

Uh oh!

andyatmiami Sep 19, 2025

Uh oh!

yehudit1987 Sep 21, 2025

Uh oh!

andyatmiami Sep 21, 2025

Uh oh!

andyatmiami Sep 19, 2025

Uh oh!

yehudit1987 Sep 21, 2025

Uh oh!

andyatmiami Sep 19, 2025

Uh oh!

yehudit1987 Sep 21, 2025

Uh oh!

andyatmiami Sep 19, 2025

Uh oh!

yehudit1987 Sep 21, 2025

Uh oh!

Uh oh!

feat: Migrate JWA test-related workflows from kubeflow/kubeflow to notebooks-v1 branch #587 #599

Are you sure you want to change the base?

feat: Migrate JWA test-related workflows from kubeflow/kubeflow to notebooks-v1 branch #587 #599

Uh oh!

Conversation

yehudit1987 commented Sep 15, 2025

Uh oh!

google-oss-prow bot commented Sep 15, 2025

Uh oh!

yehudit1987 commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andyatmiami left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yehudit1987 commented Sep 18, 2025 •

edited

Loading