-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Provide experimental images built from main #3699
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3699 +/- ##
==========================================
- Coverage 53.58% 53.57% -0.01%
==========================================
Files 1410 1411 +1
Lines 483278 483451 +173
==========================================
+ Hits 258946 259009 +63
- Misses 185134 185227 +93
- Partials 39198 39215 +17 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. A few minor suggestions/questions below
Co-authored-by: Harshdeep Singh (harshdsingh) <38904804+super-harsh@users.noreply.github.com>
I haven't reviewed the actual changes yet (doing that soon), but ASO v2 does not use Make - we use Task for built coordination. (The existing Makefile is a remnant of ASO v1, and only used for maintenance of that.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, once we've converged on the outstanding questions, I'll be happy to approve.
Co-authored-by: Bevan Arps <bevan.arps@outlook.com>
Co-authored-by: Bevan Arps <bevan.arps@outlook.com>
/ok-to-test sha=1df8afc |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
- name: Build, tag and push docker image | ||
run: | | ||
container_id=${{env.container_id}} | ||
docker exec -e DOCKER_PUSH_TARGET "$container_id" task VERSION=${{ env.ARTIFACT_VERSION }} LATEST_VERSION_TAG=${{ env.ARTIFACT_VERSION }} controller:docker-push-multiarch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-tagging different docker images over and over is possible to cause problems, I think. See the default image pull policy
We currently set imagePullPolicy
to Always
(which TBH is probably wrong), but would mitigate this issue at least somewhat... though even then there are possible awkward situations with for example the following situations:
- User installs experimental ASO via YAML.
- We merge a new PR containing a new resource, including webhooks, Kubernetes RBAC roles, etc
- Users pod restarts for whatever reason.
- Users pod pulls the new image, but the resource YAMLs (including RBAC roles, webhooks) are from the N-1st version of experimental. Pod will try to install the new CRD that is now embedded inside of it from the new resource (assuming configuration is
crdPattern=*
), but the pod will crash during launch because it will attempt to start a watch on the resource but it doesn't have permission to do so.
The workaround there is that you can delete the ASO namespace and re-install from latest and that should fix it. The number of times this happens will likely not be super common... but we should probably document this behavior and really really emphasize that people need to not run this in production. It literally won't work if they do it long enough because we will add resources and this will happen and break them.
It's much less of a problem for doing local sorta testing, kicking the tires, etc - but running for a week in a dev or test cluster is more likely to run into this issue.
Possible solutions are to pin the specific image hash...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point but it's more of a use-case scenario, tagging with the commit is a good addition to both scenarios which users can use depending on their needs. I would not create a release for that individual sha though, that is something to document.
Would that make sense? If so, I'll open a follow-up PR with docs + additional image pushed with the sha tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally I think it would be good to have one with a commit, even if it's just built nightly so that not all commits are built.
If I need a change sooner than the current 2 month release schedule I would like to use a fixed image so I can get changes sooner and provide feedback (and continue running it until the release).
A moving image isn't very useful except for a quick test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the two scenarios that I often see in the various projects that I maintain/interact with:
:main
/:experimental
- I want to run the latest and greatest, don't ask me to check what the latest commit was:<sha>
- I want to test a given change and stick to it, if I want another change, then I will look it up
I believe ASO needs both and 2) is what Tim is looking for as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you actually need to push a specific SHA-tagged image? I don't think you do because you can always pull a sha-tagged image regardless.
I just tested this locally and it already works:
docker pull mcr.microsoft.com/k8s/azureserviceoperator@sha256:d82f04240b3b72881342ae1cd0714e154cbaa00bc9f8fb226e7ec8c23fefe28f
- which is the sha for v2.3.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing is that is needed is that, the YAML in the experimental
release will be referring to the YAML tag, so users will need to use kustomize
or some other mechanism to update that to refer to a specific SHA... unless we modify the release to produce a YAML that refs a sha rather than the experimental
tag directly, which possibly is worth doing but may need some build script changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough - We should be good to go then!
With regards to the YAML; if they want to use a SHA instead of experimental tag, then I'd suggest we leave it up to customers to do a find/replace after downloading the artifacts, I would not offer this OOTB personally.
on: | ||
# run when pushed to main is published, | ||
# which creates a new tag | ||
push: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some math: If we do push on each commit, the image we push is ~170mb compressed. acr starts charging for over 100GB consumed, and we merge roughly 2 commits a day (620 since 1 year ago), so that's a growth of ~300mb / day == 9GB/month.
We'll likely need to clean up old images eventually. I think that we can enable that feature in our ACR if we upgrade to premium.
How does a retention time for untagged images of 45d sound? We ship official releases probably around that often, and worst case if you're using a tagged image and it gets pruned you cna swap to a newer version of the image anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 45d is reasonable; we can change if anyone complains.
The core scenarios we're enabling here are for dev & test, and 45d should be enough for most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we merge this let me fix our ACR settings then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k, I've updated our ACR to be premium tier with a 45d retention and also protected all of the ASO images in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some math: If we do push on each commit, the image we push is ~170mb compressed. acr starts charging for over 100GB consumed, and we merge roughly 2 commits a day (620 since 1 year ago), so that's a growth of ~300mb / day == 9GB/month.
That's a good estimate, but in reality various layers of the images should be re-used and the net storage impact should be a lot lower, no?
What this PR does / why we need it:
Provide experimental images built from main so that end-users can easily run the latest bits from main without having to build the images themselves.
Currently it is pushing to the same image repo / path, but we may want to change that to be more explicit in the usage scenarios.
The release creation is inspired by how Higress does it, but I'm happy to adapt.
Examples:
apimaso.azurecr.io/azureserviceoperator:experimental
Special notes for your reviewer:
I am new to Make so if I'm doing things wrong/hacky, then I'm happy to improve
How does this PR make you feel:
If applicable:
Closes #3682