feat: Provide experimental images built from main #3699

tomkerkhove · 2024-01-12T14:41:25Z

What this PR does / why we need it:

Provide experimental images built from main so that end-users can easily run the latest bits from main without having to build the images themselves.

Currently it is pushing to the same image repo / path, but we may want to change that to be more explicit in the usage scenarios.

The release creation is inspired by how Higress does it, but I'm happy to adapt.

Examples:

GitHub Release (link)
Image (not public though): apimaso.azurecr.io/azureserviceoperator:experimental

Special notes for your reviewer:
I am new to Make so if I'm doing things wrong/hacky, then I'm happy to improve

How does this PR make you feel:

If applicable:

this PR contains documentation
this PR contains tests
this PR contains YAML Samples

Closes #3682

codecov-commenter · 2024-01-12T19:13:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (89bd8ac) 53.58% compared to head (1df8afc) 53.57%.
Report is 9 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3699      +/-   ##
==========================================
- Coverage   53.58%   53.57%   -0.01%     
==========================================
  Files        1410     1411       +1     
  Lines      483278   483451     +173     
==========================================
+ Hits       258946   259009      +63     
- Misses     185134   185227      +93     
- Partials    39198    39215      +17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

super-harsh

Thanks for the PR. A few minor suggestions/questions below

.github/workflows/create-release-main.yml

Co-authored-by: Harshdeep Singh (harshdsingh) <38904804+super-harsh@users.noreply.github.com>

.github/workflows/create-release-official.yml

theunrepentantgeek · 2024-01-15T19:26:47Z

I am new to Make so if I'm doing things wrong/hacky, then I'm happy to improve

I haven't reviewed the actual changes yet (doing that soon), but ASO v2 does not use Make - we use Task for built coordination. (The existing Makefile is a remnant of ASO v1, and only used for maintenance of that.)

theunrepentantgeek

This looks good, once we've converged on the outstanding questions, I'll be happy to approve.

.github/workflows/create-release-main.yml

.github/workflows/create-release-official.yml

.github/workflows/create-release-main.yml

Co-authored-by: Bevan Arps <bevan.arps@outlook.com>

theunrepentantgeek · 2024-01-17T21:28:02Z

/ok-to-test sha=1df8afc

theunrepentantgeek · 2024-01-17T21:28:09Z

/azp run

azure-pipelines · 2024-01-17T21:28:18Z

Azure Pipelines successfully started running 1 pipeline(s).

.github/workflows/create-release-experimental.yml

matthchr · 2024-01-18T00:22:42Z

.github/workflows/create-release-experimental.yml

+      - name: Build, tag and push docker image
+        run: |
+          container_id=${{env.container_id}}
+          docker exec -e DOCKER_PUSH_TARGET "$container_id" task VERSION=${{ env.ARTIFACT_VERSION }} LATEST_VERSION_TAG=${{ env.ARTIFACT_VERSION }} controller:docker-push-multiarch


re-tagging different docker images over and over is possible to cause problems, I think. See the default image pull policy

We currently set imagePullPolicy to Always (which TBH is probably wrong), but would mitigate this issue at least somewhat... though even then there are possible awkward situations with for example the following situations:

User installs experimental ASO via YAML.

We merge a new PR containing a new resource, including webhooks, Kubernetes RBAC roles, etc

Users pod restarts for whatever reason.

Users pod pulls the new image, but the resource YAMLs (including RBAC roles, webhooks) are from the N-1st version of experimental. Pod will try to install the new CRD that is now embedded inside of it from the new resource (assuming configuration is crdPattern=*), but the pod will crash during launch because it will attempt to start a watch on the resource but it doesn't have permission to do so.

The workaround there is that you can delete the ASO namespace and re-install from latest and that should fix it. The number of times this happens will likely not be super common... but we should probably document this behavior and really really emphasize that people need to not run this in production. It literally won't work if they do it long enough because we will add resources and this will happen and break them.

It's much less of a problem for doing local sorta testing, kicking the tires, etc - but running for a week in a dev or test cluster is more likely to run into this issue.

Possible solutions are to pin the specific image hash...

I see your point but it's more of a use-case scenario, tagging with the commit is a good addition to both scenarios which users can use depending on their needs. I would not create a release for that individual sha though, that is something to document.

Would that make sense? If so, I'll open a follow-up PR with docs + additional image pushed with the sha tag

personally I think it would be good to have one with a commit, even if it's just built nightly so that not all commits are built.

If I need a change sooner than the current 2 month release schedule I would like to use a fixed image so I can get changes sooner and provide feedback (and continue running it until the release).

A moving image isn't very useful except for a quick test.

These are the two scenarios that I often see in the various projects that I maintain/interact with:

:main/:experimental - I want to run the latest and greatest, don't ask me to check what the latest commit was

:<sha> - I want to test a given change and stick to it, if I want another change, then I will look it up

I believe ASO needs both and 2) is what Tim is looking for as well

Do you actually need to push a specific SHA-tagged image? I don't think you do because you can always pull a sha-tagged image regardless.

I just tested this locally and it already works:
docker pull mcr.microsoft.com/k8s/azureserviceoperator@sha256:d82f04240b3b72881342ae1cd0714e154cbaa00bc9f8fb226e7ec8c23fefe28f - which is the sha for v2.3.0.

The only thing is that is needed is that, the YAML in the experimental release will be referring to the YAML tag, so users will need to use kustomize or some other mechanism to update that to refer to a specific SHA... unless we modify the release to produce a YAML that refs a sha rather than the experimental tag directly, which possibly is worth doing but may need some build script changes.

Fair enough - We should be good to go then!

With regards to the YAML; if they want to use a SHA instead of experimental tag, then I'd suggest we leave it up to customers to do a find/replace after downloading the artifacts, I would not offer this OOTB personally.

.github/workflows/create-release-official.yml

matthchr · 2024-01-18T20:41:38Z

.github/workflows/create-release-experimental.yml

+on:
+  # run when pushed to main is published,
+  # which creates a new tag
+  push:


Some math: If we do push on each commit, the image we push is ~170mb compressed. acr starts charging for over 100GB consumed, and we merge roughly 2 commits a day (620 since 1 year ago), so that's a growth of ~300mb / day == 9GB/month.

We'll likely need to clean up old images eventually. I think that we can enable that feature in our ACR if we upgrade to premium.

How does a retention time for untagged images of 45d sound? We ship official releases probably around that often, and worst case if you're using a tagged image and it gets pruned you cna swap to a newer version of the image anyway?

I think 45d is reasonable; we can change if anyone complains.

The core scenarios we're enabling here are for dev & test, and 45d should be enough for most cases.

Before we merge this let me fix our ACR settings then.

k, I've updated our ACR to be premium tier with a 45d retention and also protected all of the ASO images in it.

Some math: If we do push on each commit, the image we push is ~170mb compressed. acr starts charging for over 100GB consumed, and we merge roughly 2 commits a day (620 since 1 year ago), so that's a growth of ~300mb / day == 9GB/month.

That's a good estimate, but in reality various layers of the images should be re-used and the net storage impact should be a lot lower, no?

tomkerkhove added 3 commits January 12, 2024 15:38

Rename official release

2a7a18d

First rough version

5d798b7

Update create-release-main.yml

6c274e2

Update create-release-main.yml

b30ed90

tomkerkhove marked this pull request as ready for review January 12, 2024 19:14

tomkerkhove requested review from davefellows, theunrepentantgeek, matthchr, babbageclunk and super-harsh as code owners January 12, 2024 19:14

super-harsh reviewed Jan 14, 2024

View reviewed changes

docs: Align doc

d61c8a7

Co-authored-by: Harshdeep Singh (harshdsingh) <38904804+super-harsh@users.noreply.github.com>

tomkerkhove commented Jan 15, 2024

View reviewed changes

.github/workflows/create-release-official.yml Outdated Show resolved Hide resolved

Update .github/workflows/create-release-official.yml

2301ed6

theunrepentantgeek reviewed Jan 15, 2024

View reviewed changes

tomkerkhove and others added 3 commits January 16, 2024 09:52

chore: Be more explict

c2e84ae

Co-authored-by: Bevan Arps <bevan.arps@outlook.com>

fix: Wording should not contain latest

a54a77d

Co-authored-by: Bevan Arps <bevan.arps@outlook.com>

PR feedback

d9979b9

tomkerkhove mentioned this pull request Jan 16, 2024

Provide capability to build experimental image from non-main branch #3713

Closed

theunrepentantgeek approved these changes Jan 17, 2024

View reviewed changes

chore: Change workflow trigger

1df8afc

theunrepentantgeek approved these changes Jan 17, 2024

View reviewed changes

matthchr approved these changes Jan 18, 2024

View reviewed changes

matthchr reviewed Jan 18, 2024

View reviewed changes

matthchr added this pull request to the merge queue Jan 18, 2024

Merged via the queue into Azure:main with commit 8695695 Jan 18, 2024
8 checks passed

tomkerkhove deleted the experimental-images branch January 19, 2024 10:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Provide experimental images built from main #3699

feat: Provide experimental images built from main #3699

tomkerkhove commented Jan 12, 2024 •

edited

Loading

codecov-commenter commented Jan 12, 2024 •

edited

Loading

super-harsh left a comment

theunrepentantgeek commented Jan 15, 2024

theunrepentantgeek left a comment

theunrepentantgeek commented Jan 17, 2024

theunrepentantgeek commented Jan 17, 2024

azure-pipelines bot commented Jan 17, 2024

matthchr Jan 18, 2024

tomkerkhove Jan 18, 2024 •

edited

Loading

timja Jan 18, 2024

tomkerkhove Jan 18, 2024

matthchr Jan 18, 2024

matthchr Jan 18, 2024

tomkerkhove Jan 18, 2024

matthchr Jan 18, 2024

theunrepentantgeek Jan 18, 2024

matthchr Jan 18, 2024

matthchr Jan 18, 2024

tomkerkhove Jan 19, 2024

feat: Provide experimental images built from main #3699

feat: Provide experimental images built from main #3699

Conversation

tomkerkhove commented Jan 12, 2024 • edited Loading

codecov-commenter commented Jan 12, 2024 • edited Loading

Codecov Report

super-harsh left a comment

Choose a reason for hiding this comment

theunrepentantgeek commented Jan 15, 2024

theunrepentantgeek left a comment

Choose a reason for hiding this comment

theunrepentantgeek commented Jan 17, 2024

theunrepentantgeek commented Jan 17, 2024

azure-pipelines bot commented Jan 17, 2024

Choose a reason for hiding this comment

tomkerkhove Jan 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomkerkhove commented Jan 12, 2024 •

edited

Loading

codecov-commenter commented Jan 12, 2024 •

edited

Loading

tomkerkhove Jan 18, 2024 •

edited

Loading