Skip to content

Conversation

yehudit1987
Copy link

feat: migrate Jupyter Web App tests to notebooks repo

  • Add testing infrastructure scripts (kind, kustomize, istio)
  • Update test workflows to use ghcr.io/kubeflow/notebooks/jupyter-web-app
  • Update JWA component manifests and Makefile for new registry

Enables JWA tests to run in kubeflow/notebooks repository.

Solves issue #587

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kimwnasptd for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yehudit1987 yehudit1987 marked this pull request as draft September 15, 2025 06:14
@yehudit1987 yehudit1987 force-pushed the feat/migrate_jwa_tests branch 9 times, most recently from 0acf6b9 to 1c8c9ac Compare September 17, 2025 09:03
@google-oss-prow google-oss-prow bot added area/backend area - related to backend components area/ci area - related to ci area/frontend area - related to frontend components area/v1 area - version - kubeflow notebooks v1 labels Sep 17, 2025
@yehudit1987 yehudit1987 force-pushed the feat/migrate_jwa_tests branch 9 times, most recently from 4a6d212 to 0ac0946 Compare September 18, 2025 11:29
@yehudit1987 yehudit1987 force-pushed the feat/migrate_jwa_tests branch from 0ac0946 to 8b9c3cc Compare September 18, 2025 11:39
…tebooks-v1 branch kubeflow#587

Signed-off-by: Yehudit Kerido <yehudit1987@gmail.com>
@yehudit1987 yehudit1987 force-pushed the feat/migrate_jwa_tests branch from 8b9c3cc to 2c5e96d Compare September 18, 2025 13:07
@yehudit1987
Copy link
Author

yehudit1987 commented Sep 18, 2025

Discussion:
I noticed some inconsistencies with the ARM64 architecture. The build process sometimes hangs for a long time—it looks like the image gets created (most of the time), but the step never finishes.

I tried switching from --load to packing into a tar file, and that helped for a while, but the issue came back. After digging into it, it seems the architecture needs a lot of resources because of the emulation process, and it might be getting stuck while shutting down QEMU.

Proposed solution:
Since the problem happens when "closing the step," I’ve added a one-hour timeout (with about a 30-minute buffer). This way, it won’t run endlessly for six hours, and hitting the timeout won’t be treated as a failure.

The next step just shows the created images—we still need to decide whether we want the build to fail if the ARM64 step does.

@yehudit1987 yehudit1987 marked this pull request as ready for review September 18, 2025 14:28
Copy link
Contributor

@andyatmiami andyatmiami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution and getting the release process started for notebooks-v1!

I think we still need some work on this PR - primarily around communication on why some of these changes are necessary and deviate from how kubeflow/kubeflow was set up.

I am not implying (necessarily) that they are wrong - only that there is no explanation on this PR to be able to understand why the proposed changes were required.

docker-build-multi-arch: ## Build multi-arch docker images with docker buildx
cd ../ && docker buildx build --load --platform ${ARCH} --tag ${IMG}:${TAG} -f ${DOCKERFILE} .

.PHONY: docker-build-multi-arch-verify
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why we needed to add this Makefile target? Why was this not needed in kubeflow/kubeflow but is now needed in kubeflow/notebooks?

This is the kind of thing I would have expected to be called out in the commit message and/or PR description - but I don't see any reference to it...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging each architecture separately would be a breaking change for customers - and I don't think "sneaking that in" in is something I could "sign off on" without discussion from community.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so as I mention, there where issues with ARM64 and I did of course open my proposal to discussion.

This target was added only for this scenario (and not instead of) and it helps to validate what is happening with the problematic image.
Now as I mention, most of the time (if not all) I can't fail the build step since not sure the image build is failing, it just get stuck.

I'm not sue why it's happens here and not in kubeflow/kubeflow but from what I have read it might be related to differences in runners. Maybe the experts from the community can put some light on this issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, apologies - I had missed your discussion comment on the PR!

Comment on lines +47 to +75
- name: Verify multi-arch builds
run: |
echo "Verifying multi-arch build results:"
echo "Listing all jupyter-web-app images:"
docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}\t{{.CreatedAt}}" | grep jupyter-web-app || echo "No jupyter-web-app images found"
echo ""
echo "Checking individual architectures:"
TAG=$(git describe --tags --always --dirty)
echo "Base TAG: $TAG"
# Check each architecture
if docker images | grep -q "jupyter-web-app.*${TAG}-linux-amd64"; then
echo "✅ AMD64: SUCCESS"
else
echo "❌ AMD64: FAILED"
fi
if docker images | grep -q "jupyter-web-app.*${TAG}-linux-ppc64le"; then
echo "✅ PPC64LE: SUCCESS"
else
echo "❌ PPC64LE: FAILED"
fi
if docker images | grep -q "jupyter-web-app.*${TAG}-linux-arm64-v8"; then
echo "✅ ARM64: SUCCESS"
else
echo "ℹ️ ARM64: Not built (expected if timeout occurred)"
fi No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my comment on https://github.com/kubeflow/notebooks/pull/599/files#diff-07cbcedb761f69b198691b554bea634acaf78ebc7005415b45a3fe51033cc561...

I am not sure why this Verify command is necessary... are you observing cases where the build step does NOT load an image - and yet the GitHub Action workflow continues on? That would be the only situation in which I would think we need this explicit "Verify" step - and yet I would expec the build step to fail (and likewise the workflow to error).

I need more details around necessity of this logic before I could sign off on these changes.

I'm unclear why this Verify target is necessary (?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. I'm observing cases where image not loaded (not sure if they will pop since the step is hang). As mentioned before I can't decide on failure in the build phase. In this PR my initial suggestion is to wrap it with timeout and then run some validations, of course in that phase we can fail the build if needed.

Comment on lines +39 to +45
- name: Build ARM64 Image (Optional)
timeout-minutes: 60
continue-on-error: true
run: |
cd components/crud-web-apps/jupyter
ARCH=linux/arm64/v8 make docker-build-multi-arch-verify
echo "Completed linux/arm64/v8 build!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not an optional step in the original kubeflow/kubeflow JWA Multi-Arch Build Test workflow.. Would need you to add additional context here to judge if this is in fact a warranted/necessary change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as mentioned before, from my investigation it might be a runner issue but I'm not sure.

- components/crud-web-apps/common/**
- releasing/version/VERSION
branches:
- master
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubeflow/notebooks does NOT have a master branch... I would assume this is intended to read main (?)

image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely correct, my bad. Will fix it in the next patch after we will resolve some of those discussions above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backend area - related to backend components area/ci area - related to ci area/frontend area - related to frontend components area/v1 area - version - kubeflow notebooks v1 size/L
Projects
Status: Needs Triage
Development

Successfully merging this pull request may close these issues.

2 participants