Skip to content

OCPBUGS-74151: Wait for revision stability before removing etcd members#1540

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
hasbro17:member-removal-revision-stability-check
Feb 22, 2026
Merged

OCPBUGS-74151: Wait for revision stability before removing etcd members#1540
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
hasbro17:member-removal-revision-stability-check

Conversation

@hasbro17
Copy link
Contributor

@hasbro17 hasbro17 commented Feb 1, 2026

Previously, the ClusterMemberRemovalController would remove etcd members during revision rollouts, causing cluster degradation when simultaneously deleting multiple control plane machines with the OnDelete strategy.

During a revision rollout, etcd members can temporarily appear unhealthy while their pods are reinstalled to the latest revision. This is different from members being indefinitely unhealthy on a stable revision.

Additionally, the EtcdEndpointsController pauses during revision rollouts, so when a replacement machine is added and triggers a rollout, the etcd-endpoints configmap won't update. This causes API servers on the old revision to use removed member endpoints, leading to API unavailability.

This change adds a revision stability check before allowing member removal, ensuring we only remove members when revisions are stable and unhealthy members are truly unhealthy. This explicitly codifies the 4.17 behavior where the operator waited for all revisions to complete before removing members and lifecycle hooks.

TODO: Before merging this needs a corresponding test in the etcd vertical scaling suite to make sure we can validate this all the way back to 4.18.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 1, 2026
@hasbro17
Copy link
Contributor Author

hasbro17 commented Feb 1, 2026

/hold

Need to add a test in the scaling suite to verify that we can delete all 3 machines and get replacements without the cluster hanging up

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Feb 1, 2026
@openshift-ci-robot
Copy link

@hasbro17: This pull request references Jira Issue OCPBUGS-74151, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @geliu2016

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Previously, the ClusterMemberRemovalController would remove etcd members during revision rollouts, causing cluster degradation when simultaneously deleting multiple control plane machines with the OnDelete strategy.

During a revision rollout, etcd members can temporarily appear unhealthy while their pods are reinstalled to the latest revision. This is different from members being indefinitely unhealthy on a stable revision.

Additionally, the EtcdEndpointsController pauses during revision rollouts, so when a replacement machine is added and triggers a rollout, the etcd-endpoints configmap won't update. This causes API servers on the old revision to use removed member endpoints, leading to API unavailability.

This change adds a revision stability check before allowing member removal, ensuring we only remove members when revisions are stable and unhealthy members are truly unhealthy. This explicitly codifies the 4.17 behavior where the operator waited for all revisions to complete before removing members and lifecycle hooks.

TODO: Before merging this needs a corresponding test in the etcd vertical scaling suite to make sure we can validate this all the way back to 4.18.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from geliu2016 February 1, 2026 23:18
@coderabbitai
Copy link

coderabbitai bot commented Feb 1, 2026

Walkthrough

Adds a revision-stability gate and a pre-check that compares live etcd voting-member IPs to the etcd-endpoints ConfigMap in clusterMemberRemovalController.sync; introduces a ConfigMap→IP-set helper and unit tests; exits early on instability, inability to determine stability, or membership mismatch.

Changes

Cohort / File(s) Summary
Cluster member removal controller
pkg/operator/clustermemberremovalcontroller/clustermemberremovalcontroller.go
Adds IsRevisionStable check at start of sync, returns early on indeterminate/unstable revision; adds pre-check that compares live etcd voting-member IPs to the etcd-endpoints ConfigMap and skips removals when they differ; adds error reporting for stability and live/config fetch failures.
CEO helpers utility
pkg/operator/ceohelpers/common.go
Adds exported helper MemberIPSetFromConfigMap(cm *corev1.ConfigMap) sets.String to build a string set from ConfigMap Data values for membership comparison.
Tests
pkg/operator/clustermemberremovalcontroller/clustermemberremovalcontroller_test.go
Adds TestIsEtcdEndpointsUpdated covering matching membership, scale-up/scale-down mismatches, IP differences, and ConfigMap-not-found error path.
Go module manifest
go.mod
Module file updated (lines changed).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 1, 2026
@openshift-ci openshift-ci bot requested review from dusk125 and ironcladlou February 1, 2026 23:19
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 1, 2026
@hasbro17 hasbro17 removed the request for review from ironcladlou February 1, 2026 23:31
@JoelSpeed
Copy link
Contributor

/testwith openshift/cluster-control-plane-machine-set-operator/main/e2e-aws-periodic-pre openshift/cluster-control-plane-machine-set-operator#383

return nil
}

// TODO(haseeb): We should also explicitly put the brakes on the member removal so we don't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good find. IIRC I've removed something along the lines to enable this "my machine is unhealthy, please delete it automatically" admin scenario.

With the static pod controller it so happens that sometimes the revisions get stuck, so it would never be able to really remove it and an admin has to go and manually delete the member (or fix the node status somehow). Maybe you find some time to add a test for this, if we don't already have one... Even though it must be quite annoying to mock this in CI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed something along the lines to enable this "my machine is unhealthy, please delete it automatically" admin scenario.

Sorry thomas it's been a while so my memory is hazy but do you mean this?
#947
i.e that's where we first allowed scaling down/member removal if an unhealthy machine is pending deletion.

sometimes the revisions get stuck

If a revision rollout is stuck, and the result is an unhealthy etcd member (e.g pod never comes up) and the intent is to remove the unhealthy machine to fix that, then yeah this change would prevent the controller from scaling down and would require manual intervention from the admin (to manually delete the member or deletion hook).
But that seems like a tradeoff we'll have to make given the issue in this bug.

add a test for this

Sorry do you mean a unit test to verify that we cannot scale down during a stuck revision rollout?

We have tests for the unhealthy member scale down but nothing around revision rollouts.
Although seems like you've mocked tests with revision rollouts for the cert signer controller so I could probably reuse that.
https://github.com/openshift/cluster-etcd-operator/blob/main/pkg/operator/etcdcertsigner/etcdcertsignercontroller_test.go#L74-L79

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems this was #947 - even though 4.12 sounds like a long time ago.

But that seems like a tradeoff we'll have to make given the issue in this bug.

agreed

Sorry do you mean a unit test to verify that we cannot scale down during a stuck revision rollout?

I was think of another e2e test that makes a node unready and then ensures it can be replaced.
best suited in https://github.com/openshift/origin/blob/main/test/extended/etcd/vertical_scaling.go

Not sure this is covered by CPMS somehow, though...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, the e2e test for making sure we can scale down an unhealthy member.
@jubittajohn already did a lot of work on that front in openshift/origin#29236
but I'll have to revisit on why that was held up. I want to say it was stopping the kubelet to make the node unhealthy that was then making that whole test pretty disruptive and failing lots of other tests.

But yeah, it would be good to revive that so we don't regress on that in the future. I'll check with Jubitta to see what was up and if I can bring that back.

@JoelSpeed On a related note, do you know if there are any e2e tests on the CPMSO side that exercise scaling operations for an unhealthy cluster? Doesn't look like we do from a quick glance in https://github.com/openshift/cluster-control-plane-machine-set-operator/tree/main/test/e2e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my knowledge no, we don't have any tests for unhealthy members. CPMS doesn't do any health monitoring so I guess it didn't make sense to us to test unhealthy members.

If you have a suggestion for how we can make a member unhealthy we could add that to our suite though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weren't there machine health checks or something along those lines?

If you have a suggestion for how we can make a member unhealthy we could add that to our suite though

the most reliable way is to just sudo systemctl stop kubelet.service

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so machine health checks are a thing, but they're a separate thing to CPMS. They basically delete the machine when they detect something is bad.

So we could create a test that gets onto a node, runs sudo systemctl stop kubelet.service, then an MHC would delete the machine after some period of the machine being unhealthy (or we could do that ourselves?) and then the test would continue with CPMS replacing it

@huali9 Do you know of any examples in our existing suites where we exec into nodes we could crib off for this?

Copy link
Contributor

@jubittajohn jubittajohn Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we could create a test that gets onto a node, runs sudo systemctl stop kubelet.service, then an MHC would delete the machine after some period of the machine being unhealthy (or we could do that ourselves?) and then the test would continue with CPMS replacing it

@JoelSpeed We have this scenario implemented and validated in the an E2E test here , as a part of an unmerged PR that @hasbro17 was referring to. While the scenario itself was successfully tested, we had to put it on hold for a few reasons.

  • The main reason is the invariant failures introduced by the new vertical scaling unhappy-path E2E tests(due to us intentionally stopping the kubelet). To address this, we would need to introduce a new disruptive test suite. TRT has given us the green light to proceed in this direction, which would allow us to isolate the monitor tests we expect to fail and explicitly skip them for these scenarios.

  • Another minor reason was TRT’s effort to add vertical scaling tests to component readiness, which is currently blocked by OCPBUGS-43379. This is because we still need to investigate and fix the invariant failures in the existing scaling tests that are significantly affecting pass rates. (And I guess the scaling tests don’t even run anymore, since the optional jobs that covered them were pruned some time ago as part of CI cost-cutting efforts).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @JoelSpeed, on our end we've been doing manual testing where we SSH into the nodes directly - we haven't implemented automated node exec in our test suites yet due to stability concerns with these disruptive tests. As @jubittajohn mentioned, there's already an automated implementation in that E2E test, so we shouldn't need to add another one, right? We can just reference their existing work.

@JoelSpeed
Copy link
Contributor

Multi PR test showed that the 3 delete master scenario passed, but it timed out at 4 hours, so we will need to lengthen the timeout and run again. Looks promising though!

@hasbro17
Copy link
Contributor Author

hasbro17 commented Feb 3, 2026

I forgot multi PR tests are a thing now. That's quite useful. The e2e tests for our own scaling suite will take me a while to write and wire up as a presubmit, so the CPMSO test is a good signal in the meantime.

Since the removal controller will now take its sweet time waiting for revisions to stabilize it will take a while.
The etcd-operator logs from the test run are post member removal so that doesn't tell me how long.
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/multi-pr-openshift-cluster-etcd-operator-1540-openshift-cluster-control-plane-machine-set-operator-383-e2e-aws-periodic-pre/2018248772705849344/artifacts/e2e-aws-periodic-pre/gather-extra/artifacts/pods/openshift-etcd-operator_etcd-operator-777c5675fc-zwntz_etcd-operator.log

Either way I see the timeout has been bumped already.

@hasbro17
Copy link
Contributor Author

hasbro17 commented Feb 3, 2026

/testwith openshift/cluster-control-plane-machine-set-operator/main/e2e-aws-periodic-pre openshift/cluster-control-plane-machine-set-operator#383

@JoelSpeed
Copy link
Contributor

We also had to bump the config in the release repo to 6h which I think went in after you kicked that off, so we may need to trigger that again to get a complete run. But the first did successfully complete the new test where we delete the three master simultaneously

In case you weren't aware, that suite we are kicking off is part of the release blocking payloads, so as soon as this is fixed we can add this in as release blocking

@hasbro17
Copy link
Contributor Author

hasbro17 commented Feb 3, 2026

/testwith openshift/cluster-control-plane-machine-set-operator/main/e2e-aws-periodic-pre openshift/cluster-control-plane-machine-set-operator#383

@lance5890
Copy link
Contributor

m

@JoelSpeed
Copy link
Contributor

/testwith openshift/cluster-control-plane-machine-set-operator/main/e2e-aws-periodic-pre openshift/cluster-control-plane-machine-set-operator#383

Image build failures should be fixed now

@huali9
Copy link

huali9 commented Feb 5, 2026

/testwith openshift/cluster-control-plane-machine-set-operator/main/e2e-aws-periodic-pre openshift/cluster-control-plane-machine-set-operator#383

@hasbro17
Copy link
Contributor Author

hasbro17 commented Feb 5, 2026

Writing up the e2e test for delete all masters on our side here openshift/origin#30760 but I might leave out the unhealthy member case if that leads me down other rabbit holes on fixing/skipping invariants.

Also I still need to add unit tests here for verifying:

  • No removals during revision rollouts (unhealthy or otherwise)
  • No more removals if the etcd-endpoints configmap is lagging behind the actual membership

@hasbro17 hasbro17 force-pushed the member-removal-revision-stability-check branch from 2248459 to 2c512da Compare February 5, 2026 23:33
@hasbro17
Copy link
Contributor Author

hasbro17 commented Feb 5, 2026

/testwith openshift/cluster-control-plane-machine-set-operator/main/e2e-aws-periodic-pre openshift/cluster-control-plane-machine-set-operator#383

Added the check for slowing down member removals while the configmap lags behind the live membership

Testing in openshift/origin#30760 (comment)

@hasbro17
Copy link
Contributor Author

hasbro17 commented Feb 5, 2026

/testwith openshift/cluster-control-plane-machine-set-operator/main/e2e-aws-periodic-pre openshift/cluster-control-plane-machine-set-operator#383

Previously, the ClusterMemberRemovalController would remove etcd members
during revision rollouts, causing cluster degradation when simultaneously
deleting multiple control plane machines with the OnDelete strategy.

During a revision rollout, etcd members can temporarily appear unhealthy
while their pods are reinstalled to the latest revision. This is different
from members being indefinitely unhealthy on a stable revision.

Additionally, the EtcdEndpointsController pauses during revision rollouts,
so when a replacement machine is added and triggers a rollout, the
etcd-endpoints configmap won't update. This causes API servers on the old
revision to use removed member endpoints, leading to API unavailability.

This change adds a revision stability check before allowing member removal,
ensuring we only remove members when revisions are stable and unhealthy
members are truly unhealthy. This explicitly codifies the 4.17 behavior
where the operator waited for all revisions to complete before removing
members and lifecycle hooks.

Additionally, the ClusterMemberRemovalController now verifies that the live
etcd membership matches the configmap before proceeding with member removal,
preventing potential issues during rapid member deletion
@hasbro17 hasbro17 force-pushed the member-removal-revision-stability-check branch from 2c512da to 0168733 Compare February 6, 2026 06:29
@hasbro17 hasbro17 changed the title DNM: OCPBUGS-74151: Wait for revision stability before removing etcd members OCPBUGS-74151: Wait for revision stability before removing etcd members Feb 6, 2026
@hasbro17
Copy link
Contributor Author

hasbro17 commented Feb 6, 2026

Already have a unit test for the revision stability helper so not going to redo that

func Test_IsRevisionStable(t *testing.T) {

Added a unit test for skipping when we have a membership inconsistency between the live membership and configmap.

With all this, the e2e test over on openshift/origin#30760 passes(minus the usual invariants):
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/multi-pr-openshift-origin-30760-openshift-cluster-etcd-operator-1540-e2e-aws-ovn-etcd-scaling/2019555419822755840

started: 0/1/3 "[sig-etcd][Feature:EtcdVerticalScaling][Suite:openshift/etcd/scaling][Serial] etcd is able to delete all masters with OnDelete strategy and wait for CPMSO to replace them [Timeout:120m][apigroup:machine.openshift.io]"

passed: (58m3s) 2026-02-06T01:54:10 "[sig-etcd][Feature:EtcdVerticalScaling][Suite:openshift/etcd/scaling][Serial] etcd is able to delete all masters with OnDelete strategy and wait for CPMSO to replace them [Timeout:120m][apigroup:machine.openshift.io]"

Holding this until the test in origin merges so we can actually run the e2e-aws-ovn-etcd-scaling presubmit here to verify.

On that note, I've changed my mind, this new test can stay in scaling suite's workflow and doesn't need a new presubmit/job. Seems to run fine serially with the others and passed in <60m.

/hold

@tjungblu
Copy link
Contributor

tjungblu commented Feb 6, 2026

/lgtm

thanks @hasbro17 :)

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 6, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hasbro17, tjungblu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hasbro17
Copy link
Contributor Author

hasbro17 commented Feb 6, 2026

/testwith openshift/cluster-control-plane-machine-set-operator/main/e2e-aws-periodic-pre openshift/cluster-control-plane-machine-set-operator#383

@hasbro17
Copy link
Contributor Author

Sorry this got held up, was out on PTO. I finally have the delete all masters e2e test passing on etcd's scaling suite openshift/origin#30760

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/multi-pr-openshift-origin-30760-openshift-cluster-etcd-operator-1540-e2e-aws-ovn-etcd-scaling/2023473194865790976

passed: (1h2m2s) 2026-02-16T21:19:21 "[sig-etcd][Feature:EtcdVerticalScaling][Suite:openshift/etcd/scaling][Serial] etcd is able to delete all masters with OnDelete strategy and wait for CPMSO to replace them [Timeout:120m][apigroup:machine.openshift.io]"

Going to unhold this and merge.

@hasbro17
Copy link
Contributor Author

/unhold
/retest-required

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 17, 2026
@tjungblu
Copy link
Contributor

/retest-required

1 similar comment
@tjungblu
Copy link
Contributor

/retest-required

@tjungblu
Copy link
Contributor

doesn't actually look like something caused by this PR, overriding:
/override ci/prow/e2e-agnostic-ovn

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 18, 2026

@tjungblu: Overrode contexts on behalf of tjungblu: ci/prow/e2e-agnostic-ovn

Details

In response to this:

doesn't actually look like something caused by this PR, overriding:
/override ci/prow/e2e-agnostic-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hasbro17
Copy link
Contributor Author

/label acknowledge-critical-fixes-only
/verified by me

@openshift-ci openshift-ci bot added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Feb 18, 2026
@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Feb 18, 2026
@openshift-ci-robot
Copy link

@hasbro17: This PR has been marked as verified by me.

Details

In response to this:

/label acknowledge-critical-fixes-only
/verified by me

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jubittajohn
Copy link
Contributor

/retest-required

@jubittajohn
Copy link
Contributor

/retest

@dusk125
Copy link
Contributor

dusk125 commented Feb 20, 2026

/retest-required

2 similar comments
@jubittajohn
Copy link
Contributor

/retest-required

@lance5890
Copy link
Contributor

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 22, 2026

@hasbro17: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 829dc42 into openshift:main Feb 22, 2026
16 checks passed
@openshift-ci-robot
Copy link

@hasbro17: Jira Issue Verification Checks: Jira Issue OCPBUGS-74151
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-74151 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Details

In response to this:

Previously, the ClusterMemberRemovalController would remove etcd members during revision rollouts, causing cluster degradation when simultaneously deleting multiple control plane machines with the OnDelete strategy.

During a revision rollout, etcd members can temporarily appear unhealthy while their pods are reinstalled to the latest revision. This is different from members being indefinitely unhealthy on a stable revision.

Additionally, the EtcdEndpointsController pauses during revision rollouts, so when a replacement machine is added and triggers a rollout, the etcd-endpoints configmap won't update. This causes API servers on the old revision to use removed member endpoints, leading to API unavailability.

This change adds a revision stability check before allowing member removal, ensuring we only remove members when revisions are stable and unhealthy members are truly unhealthy. This explicitly codifies the 4.17 behavior where the operator waited for all revisions to complete before removing members and lifecycle hooks.

TODO: Before merging this needs a corresponding test in the etcd vertical scaling suite to make sure we can validate this all the way back to 4.18.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-cherrypick-robot

@hasbro17: #1540 failed to apply on top of branch "release-4.21":

Applying: OCPBUGS-74151: Wait for revision stability before removing etcd members
Using index info to reconstruct a base tree...
M	pkg/operator/clustermemberremovalcontroller/clustermemberremovalcontroller.go
M	pkg/operator/clustermemberremovalcontroller/clustermemberremovalcontroller_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/operator/clustermemberremovalcontroller/clustermemberremovalcontroller_test.go
Auto-merging pkg/operator/clustermemberremovalcontroller/clustermemberremovalcontroller.go
CONFLICT (content): Merge conflict in pkg/operator/clustermemberremovalcontroller/clustermemberremovalcontroller.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0001 OCPBUGS-74151: Wait for revision stability before removing etcd members

Details

In response to this:

/cherry-pick release-4.21 release-4.20 release-4.19 release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.22.0-0.nightly-2026-02-21-040517

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants