Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Provide experimental images built from main #3699

Merged
merged 10 commits into from
Jan 18, 2024

Conversation

tomkerkhove
Copy link
Member

@tomkerkhove tomkerkhove commented Jan 12, 2024

What this PR does / why we need it:

Provide experimental images built from main so that end-users can easily run the latest bits from main without having to build the images themselves.

Currently it is pushing to the same image repo / path, but we may want to change that to be more explicit in the usage scenarios.

The release creation is inspired by how Higress does it, but I'm happy to adapt.

Examples:

  • GitHub Release (link)
  • Image (not public though): apimaso.azurecr.io/azureserviceoperator:experimental

Special notes for your reviewer:
I am new to Make so if I'm doing things wrong/hacky, then I'm happy to improve

How does this PR make you feel:
gif

If applicable:

  • this PR contains documentation
  • this PR contains tests
  • this PR contains YAML Samples

Closes #3682

@codecov-commenter
Copy link

codecov-commenter commented Jan 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (89bd8ac) 53.58% compared to head (1df8afc) 53.57%.
Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3699      +/-   ##
==========================================
- Coverage   53.58%   53.57%   -0.01%     
==========================================
  Files        1410     1411       +1     
  Lines      483278   483451     +173     
==========================================
+ Hits       258946   259009      +63     
- Misses     185134   185227      +93     
- Partials    39198    39215      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@super-harsh super-harsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. A few minor suggestions/questions below

.github/workflows/create-release-main.yml Outdated Show resolved Hide resolved
.github/workflows/create-release-main.yml Outdated Show resolved Hide resolved
.github/workflows/create-release-main.yml Outdated Show resolved Hide resolved
.github/workflows/create-release-main.yml Outdated Show resolved Hide resolved
.github/workflows/create-release-main.yml Outdated Show resolved Hide resolved
Co-authored-by: Harshdeep Singh (harshdsingh) <38904804+super-harsh@users.noreply.github.com>
@theunrepentantgeek
Copy link
Member

I am new to Make so if I'm doing things wrong/hacky, then I'm happy to improve

I haven't reviewed the actual changes yet (doing that soon), but ASO v2 does not use Make - we use Task for built coordination. (The existing Makefile is a remnant of ASO v1, and only used for maintenance of that.)

Copy link
Member

@theunrepentantgeek theunrepentantgeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, once we've converged on the outstanding questions, I'll be happy to approve.

.github/workflows/create-release-main.yml Outdated Show resolved Hide resolved
.github/workflows/create-release-main.yml Outdated Show resolved Hide resolved
.github/workflows/create-release-main.yml Outdated Show resolved Hide resolved
.github/workflows/create-release-main.yml Outdated Show resolved Hide resolved
.github/workflows/create-release-official.yml Outdated Show resolved Hide resolved
.github/workflows/create-release-main.yml Outdated Show resolved Hide resolved
tomkerkhove and others added 3 commits January 16, 2024 09:52
Co-authored-by: Bevan Arps <bevan.arps@outlook.com>
Co-authored-by: Bevan Arps <bevan.arps@outlook.com>
@theunrepentantgeek
Copy link
Member

/ok-to-test sha=1df8afc

@theunrepentantgeek
Copy link
Member

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

- name: Build, tag and push docker image
run: |
container_id=${{env.container_id}}
docker exec -e DOCKER_PUSH_TARGET "$container_id" task VERSION=${{ env.ARTIFACT_VERSION }} LATEST_VERSION_TAG=${{ env.ARTIFACT_VERSION }} controller:docker-push-multiarch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-tagging different docker images over and over is possible to cause problems, I think. See the default image pull policy

We currently set imagePullPolicy to Always (which TBH is probably wrong), but would mitigate this issue at least somewhat... though even then there are possible awkward situations with for example the following situations:

  1. User installs experimental ASO via YAML.
  2. We merge a new PR containing a new resource, including webhooks, Kubernetes RBAC roles, etc
  3. Users pod restarts for whatever reason.
  4. Users pod pulls the new image, but the resource YAMLs (including RBAC roles, webhooks) are from the N-1st version of experimental. Pod will try to install the new CRD that is now embedded inside of it from the new resource (assuming configuration is crdPattern=*), but the pod will crash during launch because it will attempt to start a watch on the resource but it doesn't have permission to do so.

The workaround there is that you can delete the ASO namespace and re-install from latest and that should fix it. The number of times this happens will likely not be super common... but we should probably document this behavior and really really emphasize that people need to not run this in production. It literally won't work if they do it long enough because we will add resources and this will happen and break them.

It's much less of a problem for doing local sorta testing, kicking the tires, etc - but running for a week in a dev or test cluster is more likely to run into this issue.

Possible solutions are to pin the specific image hash...

Copy link
Member Author

@tomkerkhove tomkerkhove Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point but it's more of a use-case scenario, tagging with the commit is a good addition to both scenarios which users can use depending on their needs. I would not create a release for that individual sha though, that is something to document.

Would that make sense? If so, I'll open a follow-up PR with docs + additional image pushed with the sha tag

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally I think it would be good to have one with a commit, even if it's just built nightly so that not all commits are built.

If I need a change sooner than the current 2 month release schedule I would like to use a fixed image so I can get changes sooner and provide feedback (and continue running it until the release).

A moving image isn't very useful except for a quick test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the two scenarios that I often see in the various projects that I maintain/interact with:

  1. :main/:experimental - I want to run the latest and greatest, don't ask me to check what the latest commit was
  2. :<sha> - I want to test a given change and stick to it, if I want another change, then I will look it up

I believe ASO needs both and 2) is what Tim is looking for as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you actually need to push a specific SHA-tagged image? I don't think you do because you can always pull a sha-tagged image regardless.

I just tested this locally and it already works:
docker pull mcr.microsoft.com/k8s/azureserviceoperator@sha256:d82f04240b3b72881342ae1cd0714e154cbaa00bc9f8fb226e7ec8c23fefe28f - which is the sha for v2.3.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing is that is needed is that, the YAML in the experimental release will be referring to the YAML tag, so users will need to use kustomize or some other mechanism to update that to refer to a specific SHA... unless we modify the release to produce a YAML that refs a sha rather than the experimental tag directly, which possibly is worth doing but may need some build script changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough - We should be good to go then!

With regards to the YAML; if they want to use a SHA instead of experimental tag, then I'd suggest we leave it up to customers to do a find/replace after downloading the artifacts, I would not offer this OOTB personally.

.github/workflows/create-release-official.yml Show resolved Hide resolved
on:
# run when pushed to main is published,
# which creates a new tag
push:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some math: If we do push on each commit, the image we push is ~170mb compressed. acr starts charging for over 100GB consumed, and we merge roughly 2 commits a day (620 since 1 year ago), so that's a growth of ~300mb / day == 9GB/month.

We'll likely need to clean up old images eventually. I think that we can enable that feature in our ACR if we upgrade to premium.

How does a retention time for untagged images of 45d sound? We ship official releases probably around that often, and worst case if you're using a tagged image and it gets pruned you cna swap to a newer version of the image anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 45d is reasonable; we can change if anyone complains.

The core scenarios we're enabling here are for dev & test, and 45d should be enough for most cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we merge this let me fix our ACR settings then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, I've updated our ACR to be premium tier with a 45d retention and also protected all of the ASO images in it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some math: If we do push on each commit, the image we push is ~170mb compressed. acr starts charging for over 100GB consumed, and we merge roughly 2 commits a day (620 since 1 year ago), so that's a growth of ~300mb / day == 9GB/month.

That's a good estimate, but in reality various layers of the images should be re-used and the net storage impact should be a lot lower, no?

@matthchr matthchr added this pull request to the merge queue Jan 18, 2024
Merged via the queue into Azure:main with commit 8695695 Jan 18, 2024
8 checks passed
@tomkerkhove tomkerkhove deleted the experimental-images branch January 19, 2024 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Introduce experimental image
6 participants