Skip to content

OCPBUGS-56805: fix: patch status instead of updating it to avoid failed loop#306

Closed
damdo wants to merge 1 commit intoopenshift:mainfrom
damdo:fix-failed-update-clusteroperator-status-loop
Closed

OCPBUGS-56805: fix: patch status instead of updating it to avoid failed loop#306
damdo wants to merge 1 commit intoopenshift:mainfrom
damdo:fix-failed-update-clusteroperator-status-loop

Conversation

@damdo
Copy link
Member

@damdo damdo commented May 28, 2025

At the moment the cluster-capi-operator manager is running in a hot loop trying to Update() the ClusterOperator status with the most recent conditions.

E0527 19:07:44.303923       1 controller.go:316] "Reconciler error" err="failed to set conditions for InfraCluster controller: failed to sync status: failed to update cluster operator status: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"cluster-api\": the object has been modified; please apply your changes to the latest version and try again" controller="InfraClusterController" controllerGroup="config.openshift.io" controllerKind="ClusterOperator" ClusterOperator="cluster-api" namespace="" name="cluster-api" reconcileID="434ea0cf-83d3-442c-bde6-1fedddf4ffef"
...
E0527 19:07:44.357363       1 controller.go:316] "Reconciler error" err="failed to set status available: failed to update cluster operator status: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"cluster-api\": the object has been modified; please apply your changes to the latest version and try again" controller="CoreClusterController" controllerGroup="config.openshift.io" controllerKind="ClusterOperator" ClusterOperator="cluster-api" namespace="" name="cluster-api" reconcileID="9f4c2c13-ede1-4981-8761-988bfb4f66bb"
...
E0527 19:07:44.443030       1 controller.go:316] "Reconciler error" err="failed to set conditions for CAPI Installer Controller: failed to sync status: failed to update cluster operator status: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"cluster-api\": the object has been modified; please apply your changes to the latest version and try again" controller="CapiInstallerController" controllerGroup="config.openshift.io" controllerKind="ClusterOperator" ClusterOperator="cluster-api" namespace="" name="cluster-api" reconcileID="eded91df-8801-4313-ab1e-321813387601"

(more logs here)

This should instead be using a Patch() to do so to avoid conflicts on all the non relevant fields.

This is a short term fix until we reintroduce a more refined status updating mechanism + SSA which was originally merged with: #256 but got reverted by #273

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 28, 2025
@openshift-ci-robot
Copy link

@damdo: This pull request references Jira Issue OCPBUGS-56805, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.0) matches configured target version for branch (4.20.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sunzhaohua2

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

At the moment the cluster-capi-operator manager is running in a hot loop trying to Update() the ClusterOperator status with the most recent conditions.

E0527 19:07:44.303923       1 controller.go:316] "Reconciler error" err="failed to set conditions for InfraCluster controller: failed to sync status: failed to update cluster operator status: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"cluster-api\": the object has been modified; please apply your changes to the latest version and try again" controller="InfraClusterController" controllerGroup="config.openshift.io" controllerKind="ClusterOperator" ClusterOperator="cluster-api" namespace="" name="cluster-api" reconcileID="434ea0cf-83d3-442c-bde6-1fedddf4ffef"
...
E0527 19:07:44.357363       1 controller.go:316] "Reconciler error" err="failed to set status available: failed to update cluster operator status: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"cluster-api\": the object has been modified; please apply your changes to the latest version and try again" controller="CoreClusterController" controllerGroup="config.openshift.io" controllerKind="ClusterOperator" ClusterOperator="cluster-api" namespace="" name="cluster-api" reconcileID="9f4c2c13-ede1-4981-8761-988bfb4f66bb"
...
E0527 19:07:44.443030       1 controller.go:316] "Reconciler error" err="failed to set conditions for CAPI Installer Controller: failed to sync status: failed to update cluster operator status: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"cluster-api\": the object has been modified; please apply your changes to the latest version and try again" controller="CapiInstallerController" controllerGroup="config.openshift.io" controllerKind="ClusterOperator" ClusterOperator="cluster-api" namespace="" name="cluster-api" reconcileID="eded91df-8801-4313-ab1e-321813387601"

(more logs here)

This should instead be using a Patch() to do so to avoid conflicts on all the non relevant fields.

This is a short term fix until we reintroduce a more refined status updating mechanism + SSA which was originally merged with: #256 but got reverted by #273

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@JoelSpeed
Copy link
Contributor

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 28, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 28, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 28, 2025
@damdo
Copy link
Member Author

damdo commented May 28, 2025

/cherry-pick release-4.19

@openshift-cherrypick-robot

@damdo: once the present PR merges, I will cherry-pick it on top of release-4.19 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@damdo
Copy link
Member Author

damdo commented May 28, 2025

/test unit

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 4716dec and 2 for PR HEAD 00fe212 in total

@damdo
Copy link
Member Author

damdo commented May 28, 2025

/hold

I think this needs some tweaking as we are not computing the patchBase on all the changes but only on the conditions diff.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 28, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 28, 2025

@damdo: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-techpreview 00fe212 link true /test e2e-aws-ovn-techpreview
ci/prow/unit 00fe212 link true /test unit
ci/prow/e2e-azure-ovn-techpreview 00fe212 link false /test e2e-azure-ovn-techpreview
ci/prow/e2e-aws-ovn-serial-1of2 00fe212 link true /test e2e-aws-ovn-serial-1of2
ci/prow/regression-clusterinfra-cucushift-rehearse-capi-aws-ipi 00fe212 link false /test regression-clusterinfra-cucushift-rehearse-capi-aws-ipi
ci/prow/e2e-gcp-ovn-techpreview 00fe212 link true /test e2e-gcp-ovn-techpreview

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@damdo
Copy link
Member Author

damdo commented Jul 10, 2025

Superseded by #331

@damdo damdo closed this Jul 10, 2025
@openshift-ci-robot
Copy link

@damdo: This pull request references Jira Issue OCPBUGS-56805. The bug has been updated to no longer refer to the pull request using the external bug tracker.

Details

In response to this:

At the moment the cluster-capi-operator manager is running in a hot loop trying to Update() the ClusterOperator status with the most recent conditions.

E0527 19:07:44.303923       1 controller.go:316] "Reconciler error" err="failed to set conditions for InfraCluster controller: failed to sync status: failed to update cluster operator status: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"cluster-api\": the object has been modified; please apply your changes to the latest version and try again" controller="InfraClusterController" controllerGroup="config.openshift.io" controllerKind="ClusterOperator" ClusterOperator="cluster-api" namespace="" name="cluster-api" reconcileID="434ea0cf-83d3-442c-bde6-1fedddf4ffef"
...
E0527 19:07:44.357363       1 controller.go:316] "Reconciler error" err="failed to set status available: failed to update cluster operator status: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"cluster-api\": the object has been modified; please apply your changes to the latest version and try again" controller="CoreClusterController" controllerGroup="config.openshift.io" controllerKind="ClusterOperator" ClusterOperator="cluster-api" namespace="" name="cluster-api" reconcileID="9f4c2c13-ede1-4981-8761-988bfb4f66bb"
...
E0527 19:07:44.443030       1 controller.go:316] "Reconciler error" err="failed to set conditions for CAPI Installer Controller: failed to sync status: failed to update cluster operator status: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"cluster-api\": the object has been modified; please apply your changes to the latest version and try again" controller="CapiInstallerController" controllerGroup="config.openshift.io" controllerKind="ClusterOperator" ClusterOperator="cluster-api" namespace="" name="cluster-api" reconcileID="eded91df-8801-4313-ab1e-321813387601"

(more logs here)

This should instead be using a Patch() to do so to avoid conflicts on all the non relevant fields.

This is a short term fix until we reintroduce a more refined status updating mechanism + SSA which was originally merged with: #256 but got reverted by #273

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants