-
Notifications
You must be signed in to change notification settings - Fork 63
feat: Migrate JWA test-related workflows from kubeflow/kubeflow to notebooks-v1 branch #587 #599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: notebooks-v1
Are you sure you want to change the base?
feat: Migrate JWA test-related workflows from kubeflow/kubeflow to notebooks-v1 branch #587 #599
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
0acf6b9
to
1c8c9ac
Compare
4a6d212
to
0ac0946
Compare
0ac0946
to
8b9c3cc
Compare
…tebooks-v1 branch kubeflow#587 Signed-off-by: Yehudit Kerido <yehudit1987@gmail.com>
8b9c3cc
to
2c5e96d
Compare
Discussion: I tried switching from --load to packing into a tar file, and that helped for a while, but the issue came back. After digging into it, it seems the architecture needs a lot of resources because of the emulation process, and it might be getting stuck while shutting down QEMU. Proposed solution: The next step just shows the created images—we still need to decide whether we want the build to fail if the ARM64 step does. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution and getting the release process started for notebooks-v1
!
I think we still need some work on this PR - primarily around communication on why some of these changes are necessary and deviate from how kubeflow/kubeflow
was set up.
I am not implying (necessarily) that they are wrong - only that there is no explanation on this PR to be able to understand why the proposed changes were required.
docker-build-multi-arch: ## Build multi-arch docker images with docker buildx | ||
cd ../ && docker buildx build --load --platform ${ARCH} --tag ${IMG}:${TAG} -f ${DOCKERFILE} . | ||
|
||
.PHONY: docker-build-multi-arch-verify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why we needed to add this Makefile
target? Why was this not needed in kubeflow/kubeflow
but is now needed in kubeflow/notebooks
?
This is the kind of thing I would have expected to be called out in the commit message and/or PR description - but I don't see any reference to it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tagging each architecture separately would be a breaking change for customers - and I don't think "sneaking that in" in is something I could "sign off on" without discussion from community.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so as I mention, there where issues with ARM64 and I did of course open my proposal to discussion.
This target was added only for this scenario (and not instead of) and it helps to validate what is happening with the problematic image.
Now as I mention, most of the time (if not all) I can't fail the build step since not sure the image build is failing, it just get stuck.
I'm not sue why it's happens here and not in kubeflow/kubeflow but from what I have read it might be related to differences in runners. Maybe the experts from the community can put some light on this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, apologies - I had missed your discussion comment on the PR!
- name: Verify multi-arch builds | ||
run: | | ||
echo "Verifying multi-arch build results:" | ||
echo "Listing all jupyter-web-app images:" | ||
docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}\t{{.CreatedAt}}" | grep jupyter-web-app || echo "No jupyter-web-app images found" | ||
echo "" | ||
echo "Checking individual architectures:" | ||
TAG=$(git describe --tags --always --dirty) | ||
echo "Base TAG: $TAG" | ||
# Check each architecture | ||
if docker images | grep -q "jupyter-web-app.*${TAG}-linux-amd64"; then | ||
echo "✅ AMD64: SUCCESS" | ||
else | ||
echo "❌ AMD64: FAILED" | ||
fi | ||
if docker images | grep -q "jupyter-web-app.*${TAG}-linux-ppc64le"; then | ||
echo "✅ PPC64LE: SUCCESS" | ||
else | ||
echo "❌ PPC64LE: FAILED" | ||
fi | ||
if docker images | grep -q "jupyter-web-app.*${TAG}-linux-arm64-v8"; then | ||
echo "✅ ARM64: SUCCESS" | ||
else | ||
echo "ℹ️ ARM64: Not built (expected if timeout occurred)" | ||
fi No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my comment on https://github.com/kubeflow/notebooks/pull/599/files#diff-07cbcedb761f69b198691b554bea634acaf78ebc7005415b45a3fe51033cc561...
I am not sure why this Verify
command is necessary... are you observing cases where the build
step does NOT load
an image - and yet the GitHub Action workflow continues on? That would be the only situation in which I would think we need this explicit "Verify" step - and yet I would expec the build
step to fail (and likewise the workflow to error).
I need more details around necessity of this logic before I could sign off on these changes.
I'm unclear why this Verify
target is necessary (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. I'm observing cases where image not loaded (not sure if they will pop since the step is hang). As mentioned before I can't decide on failure in the build phase. In this PR my initial suggestion is to wrap it with timeout and then run some validations, of course in that phase we can fail the build if needed.
- name: Build ARM64 Image (Optional) | ||
timeout-minutes: 60 | ||
continue-on-error: true | ||
run: | | ||
cd components/crud-web-apps/jupyter | ||
ARCH=linux/arm64/v8 make docker-build-multi-arch-verify | ||
echo "Completed linux/arm64/v8 build!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not an optional step in the original kubeflow/kubeflow
JWA Multi-Arch Build Test
workflow.. Would need you to add additional context here to judge if this is in fact a warranted/necessary change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as mentioned before, from my investigation it might be a runner issue but I'm not sure.
- components/crud-web-apps/common/** | ||
- releasing/version/VERSION | ||
branches: | ||
- master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely correct, my bad. Will fix it in the next patch after we will resolve some of those discussions above.
feat: migrate Jupyter Web App tests to notebooks repo
Enables JWA tests to run in kubeflow/notebooks repository.
Solves issue #587