Skip to content

OCPBUGS-55638: OCPBUGS-55637: Add admin_acks handling in the MCO #5027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: release-4.18
Choose a base branch
from

Conversation

djoshy
Copy link
Contributor

@djoshy djoshy commented May 2, 2025

- What I did
I added a new sync loop in the operator for the admin-gates configmap. This sync loop will now begin to add the following keys:

  • ack-4.18-boot-image-opt-out-in-4.19 when a AWS/GCP cluster does not have a boot image configuration.
  • ack-4.18-aarch64-bootloader-4.19 when an aarch64/arm64 node is detected in the cluster.

These gates are only relevant for upgrades to 4.19, so they will only need to exist in the MCO's release-4.18 branch. I've also added a few units for this functionality.

- How to verify it

  • AWS/GCP boot image opt-out:
    • Launch this PR against a 4.18 cluster on the AWS or GCP platform without a boot image configuration defined. The CVO should trip Upgradeable=False (check via oc get clusterversion -o yaml). Now, add a boot image configuration - this should set Upgradeable back to true.
    • This gate should not show up on any other platforms.
  • aarch64 bootloader:
    • Launch this PR against a 4.18 aarch64 cluster. The CVO should trip Upgradeable=False (check via oc get clusterversion -o yaml).
    • This gate should not show up on clusters of any other architectures.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 2, 2025
@openshift-ci-robot
Copy link
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-55638, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected Jira Issue OCPBUGS-55638 to depend on a bug targeting a version in 4.19.0 and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

[DNM, currently testing]

- What I did

  • Added a sync loop for admin-guards configmap with support for AWS/GCP boot image guard.

- How to verify it

  • Launch this PR against a 4.18 cluster on the AWS or GCP platform without a boot image configurationm defined. The CVO should trip Upgradeable=False. Now, add a boot image configuration - this should set Upgradeable back to true.
  • This PR should have no effect on other platforms.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from cheesesashimi and umohnani8 May 2, 2025 19:56
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 2, 2025
@djoshy djoshy force-pushed the 4.18-boot-image-admin-ack branch from 0bbd233 to eaebaed Compare May 5, 2025 14:23
@djoshy djoshy changed the title [DNM] OCPBUGS-55638: Add admin_acks handling in the MCO OCPBUGS-55638: Add admin_acks handling in the MCO May 5, 2025
@djoshy djoshy force-pushed the 4.18-boot-image-admin-ack branch from eaebaed to 910ea66 Compare May 6, 2025 19:33
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2025
@djoshy djoshy force-pushed the 4.18-boot-image-admin-ack branch from 910ea66 to 157c091 Compare May 6, 2025 19:37
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2025
@djoshy
Copy link
Contributor Author

djoshy commented May 6, 2025

/retest-required

1 similar comment
@djoshy
Copy link
Contributor Author

djoshy commented May 6, 2025

/retest-required

@djoshy djoshy changed the title OCPBUGS-55638: Add admin_acks handling in the MCO OCPBUGS-55638: OCPBUGS-55637: Add admin_acks handling in the MCO May 6, 2025
Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally lgtm, will of course need to wait for the docs to exist before we can merge

Some minor questions inline, but I think they're mostly clear

BootImageAWSGCPMsg = "GCP or AWS platform with no boot image configuration detected. OCP will automatically opt-in all GCP and AWS clusters that currently do not a boot image configuration in 4.19. Please add a configuration to disable boot image updates if this is not desired. See [insert-doc-link] "

AArch64BootImageKey = "ack-4.18-aarch64-bootloader-4.19"
AArch64BootImageMsg = "aarch64 nodes detected. Please ensure boot image are updated by following the KCS:(insertlink) prior to upgrading to 4.19."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: updated sounds like it's an action that always needs to be taken, maybe we should say something like "ensure boot image is not too old" instead? (I couldn't think of a better way to phrase it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense to me! Perhaps we should run the messages through a doc/Trevor review, they might have better insight/worked on this before

}

// If no machine managers exist, no configuration has been defined
return mcop.Spec.ManagedBootImages.MachineManagers == nil, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is saying that if the user already has opted in in a previous version, we don't raise the admin ack?

(makes sense, just thinking out loud)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily opt-ed in, but if they have any sort of opinion(all/none/partial), we'll either remove the admin-ack if one exists, or otherwise a no-op

// updateGuardKeyIfNeeded adds a key with a message depending on:
// If guard is needed and key doesn't exist => create it
// If guard is needed and key exists => nothing to do
// If guard is not needed and key exists => delete it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, in the sense that we do clear stale admin acks, but only before you do the upgrade, since these syncs will be gone after 4.18.

(again, this is fine, just thinking out loud)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so I wanted the user to able to clear the gate either by addressing it via the admin-ack configmap or via defining an boot image opinion. In 4.19, this should not have any effect per Trevor since these keys only target 4.18(but we can double check that again)

adminCM: buildAdminAckConfigMapWithData(nil),
},
{
name: "AWS platform, with a boot image configuration",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests all have a "disabled" boot image configuration (empty machinemanagers), and thereby do not need the admin ack, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, disabled meaning the user wanted to opt out so no need to raise the alarm. I can add comments/make this more verbose if its not very clear in the current state 😄

@djoshy
Copy link
Contributor Author

djoshy commented May 7, 2025

/retest-required

Copy link
Contributor

openshift-ci bot commented May 7, 2025

@djoshy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-vsphere-ovn-upi-zones 157c091 link false /test e2e-vsphere-ovn-upi-zones
ci/prow/e2e-gcp-op-techpreview 157c091 link false /test e2e-gcp-op-techpreview
ci/prow/e2e-gcp-op-ocl 157c091 link false /test e2e-gcp-op-ocl
ci/prow/e2e-azure-ovn-upgrade-out-of-change 157c091 link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/e2e-gcp-op 157c091 link true /test e2e-gcp-op

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving an approval for the code, but should wait until docs are in place to merge

Copy link
Contributor

openshift-ci bot commented May 8, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: djoshy, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants