Skip to content

OCPBUGS-55638: OCPBUGS-55637: Add admin_acks handling in the MCO #5027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

djoshy
Copy link
Contributor

@djoshy djoshy commented May 2, 2025

- What I did
I added a new sync loop in the operator for the admin-gates configmap in the openshift-config-managed namespace. This sync loop will now begin to add the following keys:

  • ack-4.18-boot-image-opt-out-in-4.19 when a AWS/GCP cluster does not have a boot image configuration.
  • ack-4.18-aarch64-bootloader-4.19 when an aarch64/arm64 node is detected in the cluster.

These gates are only relevant for upgrades to 4.19, so they will only need to exist in the MCO's release-4.18 branch. I've also added a few units for this functionality.

- How to verify it
Before testing either case, ensure that any non MCO keys in admin-gates have been acknowledged. For example, if there is a key called ack-4.18-kube-1.32-api-removals-in-4.19 in admin-gates, add this key to the admin-acks configmap in the openshift-config namespace with its value set to true like so:

apiVersion: v1
kind: ConfigMap
metadata:
  annotations:
    include.release.openshift.io/hypershift: "true"
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    kubernetes.io/description: Record administrator acknowledgments of update gates
      defined in the admin-gates ConfigMap in the openshift-config-managed namespace.
    release.openshift.io/create-only: "true"
  creationTimestamp: "2025-05-13T07:51:46Z"
  name: admin-acks
  namespace: openshift-config
  resourceVersion: "2284"
  uid: fdd6b65e-5c83-4d41-a08d-cba2bbdf1e91
data:
  ack-4.18-kube-1.32-api-removals-in-4.19: "true"
  • AWS/GCP boot image opt-out:
    • Launch this PR against a 4.18 cluster on the AWS or GCP platform without a boot image configuration defined. The CVO should trip Upgradeable=False (check via oc get clusterversion -o yaml). Now, add a boot image configuration - the CVO should no longer be setting Upgradeable to false.
    • This gate should not show up on any other platforms.
  • aarch64 bootloader:
    • Launch this PR against a 4.18 aarch64 cluster. The CVO should trip Upgradeable=False (check via oc get clusterversion -o yaml).
    • This gate should not show up on clusters of any other architectures.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 2, 2025
@openshift-ci-robot
Copy link
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-55638, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected Jira Issue OCPBUGS-55638 to depend on a bug targeting a version in 4.19.0 and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

[DNM, currently testing]

- What I did

  • Added a sync loop for admin-guards configmap with support for AWS/GCP boot image guard.

- How to verify it

  • Launch this PR against a 4.18 cluster on the AWS or GCP platform without a boot image configurationm defined. The CVO should trip Upgradeable=False. Now, add a boot image configuration - this should set Upgradeable back to true.
  • This PR should have no effect on other platforms.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from cheesesashimi and umohnani8 May 2, 2025 19:56
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 2, 2025
@djoshy djoshy force-pushed the 4.18-boot-image-admin-ack branch from 0bbd233 to eaebaed Compare May 5, 2025 14:23
@djoshy djoshy changed the title [DNM] OCPBUGS-55638: Add admin_acks handling in the MCO OCPBUGS-55638: Add admin_acks handling in the MCO May 5, 2025
@djoshy djoshy force-pushed the 4.18-boot-image-admin-ack branch from eaebaed to 910ea66 Compare May 6, 2025 19:33
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2025
@djoshy djoshy force-pushed the 4.18-boot-image-admin-ack branch from 910ea66 to 157c091 Compare May 6, 2025 19:37
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2025
@djoshy
Copy link
Contributor Author

djoshy commented May 6, 2025

/retest-required

1 similar comment
@djoshy
Copy link
Contributor Author

djoshy commented May 6, 2025

/retest-required

@djoshy djoshy changed the title OCPBUGS-55638: Add admin_acks handling in the MCO OCPBUGS-55638: OCPBUGS-55637: Add admin_acks handling in the MCO May 6, 2025
Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally lgtm, will of course need to wait for the docs to exist before we can merge

Some minor questions inline, but I think they're mostly clear

BootImageAWSGCPMsg = "GCP or AWS platform with no boot image configuration detected. OCP will automatically opt-in all GCP and AWS clusters that currently do not a boot image configuration in 4.19. Please add a configuration to disable boot image updates if this is not desired. See [insert-doc-link] "

AArch64BootImageKey = "ack-4.18-aarch64-bootloader-4.19"
AArch64BootImageMsg = "aarch64 nodes detected. Please ensure boot image are updated by following the KCS:(insertlink) prior to upgrading to 4.19."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: updated sounds like it's an action that always needs to be taken, maybe we should say something like "ensure boot image is not too old" instead? (I couldn't think of a better way to phrase it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense to me! Perhaps we should run the messages through a doc/Trevor review, they might have better insight/worked on this before

}

// If no machine managers exist, no configuration has been defined
return mcop.Spec.ManagedBootImages.MachineManagers == nil, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is saying that if the user already has opted in in a previous version, we don't raise the admin ack?

(makes sense, just thinking out loud)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily opt-ed in, but if they have any sort of opinion(all/none/partial), we'll either remove the admin-ack if one exists, or otherwise a no-op

// updateGuardKeyIfNeeded adds a key with a message depending on:
// If guard is needed and key doesn't exist => create it
// If guard is needed and key exists => nothing to do
// If guard is not needed and key exists => delete it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, in the sense that we do clear stale admin acks, but only before you do the upgrade, since these syncs will be gone after 4.18.

(again, this is fine, just thinking out loud)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so I wanted the user to able to clear the gate either by addressing it via the admin-ack configmap or via defining an boot image opinion. In 4.19, this should not have any effect per Trevor since these keys only target 4.18(but we can double check that again)

adminCM: buildAdminAckConfigMapWithData(nil),
},
{
name: "AWS platform, with a boot image configuration",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests all have a "disabled" boot image configuration (empty machinemanagers), and thereby do not need the admin ack, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, disabled meaning the user wanted to opt out so no need to raise the alarm. I can add comments/make this more verbose if its not very clear in the current state 😄

@djoshy
Copy link
Contributor Author

djoshy commented May 7, 2025

/retest-required

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving an approval for the code, but should wait until docs are in place to merge

@djoshy djoshy force-pushed the 4.18-boot-image-admin-ack branch from 157c091 to 9f5367d Compare May 13, 2025 14:05
@djoshy
Copy link
Contributor Author

djoshy commented May 13, 2025

Updated to include doc links.

@ptalgulk01
Copy link

Pre-merge verified :
Verified using IPI based AWS cluster

launch 4.18,openshift/machine-config-operator#5027 aws

Edit the admin-acks -n openshift-config

oc edit cm admin-acks -n openshift-config -o yaml
apiVersion: v1
data:
  ack-4.18-kube-1.32-api-removals-in-4.19: "true"
kind: ConfigMap
metadata:
  annotations:
    include.release.openshift.io/hypershift: "true"
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    kubernetes.io/description: Record administrator acknowledgments of update gates
      defined in the admin-gates ConfigMap in the openshift-config-managed namespace.
    release.openshift.io/create-only: "true"
  creationTimestamp: "2025-05-16T09:20:54Z"
  name: admin-acks
  namespace: openshift-config
  resourceVersion: "36986"
  uid: 167b0343-a43b-4e6d-b948-fa8c5a22cccc

Able to see Upgradeable = False before editing the bootImage value
Edit the machineconfiguration

....
  spec:
    logLevel: Normal
    managedBootImages:
      machineManagers:
      - apiGroup: machine.openshift.io
        resource: machinesets
        selection:
          mode: All
    managementState: Managed
    operatorLogLevel: Normal
  status:

After edit as far now able to see the Upgradeable is disappered from clusterversion and to make it appear again edit the data field from admin-acks -n openshift-config

@yuqi-zhang
Copy link
Contributor

/label backport-risk-assessed

Not an actual backport

@openshift-ci openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label May 16, 2025
@openshift-ci-robot
Copy link
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-55638, which is invalid:

  • expected the bug to be in one of the following states: NEW, ASSIGNED, POST, but it is ON_QA instead
  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected Jira Issue OCPBUGS-55638 to depend on a bug targeting a version in 4.19.0 and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

- What I did
I added a new sync loop in the operator for the admin-gates configmap in the openshift-config-managed namespace. This sync loop will now begin to add the following keys:

  • ack-4.18-boot-image-opt-out-in-4.19 when a AWS/GCP cluster does not have a boot image configuration.
  • ack-4.18-aarch64-bootloader-4.19 when an aarch64/arm64 node is detected in the cluster.

These gates are only relevant for upgrades to 4.19, so they will only need to exist in the MCO's release-4.18 branch. I've also added a few units for this functionality.

- How to verify it
Before testing either case, ensure that any non MCO keys in admin-gates have been acknowledged. For example, if there is a key called ack-4.18-kube-1.32-api-removals-in-4.19 in admin-gates, add this key to the admin-acks configmap in the openshift-config namespace with its value set to true like so:

apiVersion: v1
kind: ConfigMap
metadata:
 annotations:
   include.release.openshift.io/hypershift: "true"
   include.release.openshift.io/ibm-cloud-managed: "true"
   include.release.openshift.io/self-managed-high-availability: "true"
   kubernetes.io/description: Record administrator acknowledgments of update gates
     defined in the admin-gates ConfigMap in the openshift-config-managed namespace.
   release.openshift.io/create-only: "true"
 creationTimestamp: "2025-05-13T07:51:46Z"
 name: admin-acks
 namespace: openshift-config
 resourceVersion: "2284"
 uid: fdd6b65e-5c83-4d41-a08d-cba2bbdf1e91
data:
 ack-4.18-kube-1.32-api-removals-in-4.19: "true"
  • AWS/GCP boot image opt-out:
    • Launch this PR against a 4.18 cluster on the AWS or GCP platform without a boot image configuration defined. The CVO should trip Upgradeable=False (check via oc get clusterversion -o yaml). Now, add a boot image configuration - the CVO should no longer be setting Upgradeable to false.
    • This gate should not show up on any other platforms.
  • aarch64 bootloader:
    • Launch this PR against a 4.18 aarch64 cluster. The CVO should trip Upgradeable=False (check via oc get clusterversion -o yaml).
    • This gate should not show up on clusters of any other architectures.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy
Copy link
Contributor Author

djoshy commented May 19, 2025

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-55638, which is invalid:

  • expected the bug to be in one of the following states: NEW, ASSIGNED, POST, but it is ON_QA instead
  • expected Jira Issue OCPBUGS-55638 to depend on a bug targeting a version in 4.19.0 and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy
Copy link
Contributor Author

djoshy commented May 19, 2025

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 19, 2025
@openshift-ci-robot
Copy link
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-55638, which is valid.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.18.z) matches configured target version for branch (4.18.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note type set to "Release Note Not Required"
  • dependent bug Jira Issue OCPBUGS-56453 is in the state Closed (Done), which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-56453 targets the "4.19.0" version, which is one of the valid target versions: 4.19.0
  • bug has dependents

Requesting review from QA contact:
/cc @sergiordlr

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from sergiordlr May 19, 2025 18:19
@djoshy
Copy link
Contributor Author

djoshy commented May 19, 2025

/test all

@djoshy
Copy link
Contributor Author

djoshy commented May 19, 2025

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 19, 2025
@djoshy djoshy force-pushed the 4.18-boot-image-admin-ack branch from 9f5367d to a68e58b Compare May 20, 2025 13:26
@djoshy
Copy link
Contributor Author

djoshy commented May 20, 2025

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 20, 2025
@ptalgulk01
Copy link

/label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label May 20, 2025
@djoshy
Copy link
Contributor Author

djoshy commented May 20, 2025

/retest-required

@djoshy
Copy link
Contributor Author

djoshy commented May 21, 2025

/override ci/prow/e2e-gcp-op

This PR has passed this test in the past, and it seems to timing out in the deprovision stage

Copy link
Contributor

openshift-ci bot commented May 21, 2025

@djoshy: Overrode contexts on behalf of djoshy: ci/prow/e2e-gcp-op

In response to this:

/override ci/prow/e2e-gcp-op

This PR has passed this test in the past, and it seems to timing out in the deprovision stage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

openshift-ci bot commented May 21, 2025

@djoshy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn a68e58b link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-vsphere-ovn-upi a68e58b link false /test e2e-vsphere-ovn-upi
ci/prow/e2e-azure-ovn-upgrade-out-of-change a68e58b link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/e2e-vsphere-ovn-zones a68e58b link false /test e2e-vsphere-ovn-zones
ci/prow/e2e-gcp-op-techpreview a68e58b link false /test e2e-gcp-op-techpreview
ci/prow/e2e-gcp-op-ocl a68e58b link false /test e2e-gcp-op-ocl
ci/prow/e2e-vsphere-ovn-upi-zones a68e58b link false /test e2e-vsphere-ovn-upi-zones
ci/prow/e2e-gcp-op a68e58b link true /test e2e-gcp-op

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@yuqi-zhang
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 21, 2025
Copy link
Contributor

openshift-ci bot commented May 21, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: djoshy, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 7c89511 into openshift:release-4.18 May 21, 2025
12 of 19 checks passed
@openshift-ci-robot
Copy link
Contributor

@djoshy: Jira Issue OCPBUGS-55638: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-55638 has been moved to the MODIFIED state.

In response to this:

- What I did
I added a new sync loop in the operator for the admin-gates configmap in the openshift-config-managed namespace. This sync loop will now begin to add the following keys:

  • ack-4.18-boot-image-opt-out-in-4.19 when a AWS/GCP cluster does not have a boot image configuration.
  • ack-4.18-aarch64-bootloader-4.19 when an aarch64/arm64 node is detected in the cluster.

These gates are only relevant for upgrades to 4.19, so they will only need to exist in the MCO's release-4.18 branch. I've also added a few units for this functionality.

- How to verify it
Before testing either case, ensure that any non MCO keys in admin-gates have been acknowledged. For example, if there is a key called ack-4.18-kube-1.32-api-removals-in-4.19 in admin-gates, add this key to the admin-acks configmap in the openshift-config namespace with its value set to true like so:

apiVersion: v1
kind: ConfigMap
metadata:
 annotations:
   include.release.openshift.io/hypershift: "true"
   include.release.openshift.io/ibm-cloud-managed: "true"
   include.release.openshift.io/self-managed-high-availability: "true"
   kubernetes.io/description: Record administrator acknowledgments of update gates
     defined in the admin-gates ConfigMap in the openshift-config-managed namespace.
   release.openshift.io/create-only: "true"
 creationTimestamp: "2025-05-13T07:51:46Z"
 name: admin-acks
 namespace: openshift-config
 resourceVersion: "2284"
 uid: fdd6b65e-5c83-4d41-a08d-cba2bbdf1e91
data:
 ack-4.18-kube-1.32-api-removals-in-4.19: "true"
  • AWS/GCP boot image opt-out:
    • Launch this PR against a 4.18 cluster on the AWS or GCP platform without a boot image configuration defined. The CVO should trip Upgradeable=False (check via oc get clusterversion -o yaml). Now, add a boot image configuration - the CVO should no longer be setting Upgradeable to false.
    • This gate should not show up on any other platforms.
  • aarch64 bootloader:
    • Launch this PR against a 4.18 aarch64 cluster. The CVO should trip Upgradeable=False (check via oc get clusterversion -o yaml).
    • This gate should not show up on clusters of any other architectures.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-machine-config-operator
This PR has been included in build ose-machine-config-operator-container-v4.18.0-202505211838.p0.g7c89511.assembly.stream.el9.
All builds following this will include this PR.

@djoshy djoshy deleted the 4.18-boot-image-admin-ack branch May 29, 2025 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.