-
Notifications
You must be signed in to change notification settings - Fork 156
ETCD-328: automatic replacement of an unhealthy member machine #947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -113,7 +113,7 @@ func (c *clusterMemberRemovalController) sync(ctx context.Context, syncCtx facto | |
| // attemptToScaleDown attempts to remove a voting member only once we have identified that | ||
| // a Machine resource is being deleted and a replacement member has been created | ||
| func (c *clusterMemberRemovalController) attemptToScaleDown(ctx context.Context, recorder events.Recorder) error { | ||
| currentVotingMemberIPListSet, err := ceohelpers.VotingMemberIPListSet(c.configMapListerForTargetNamespace) | ||
| currentVotingMemberIPListSet, err := ceohelpers.VotingMemberIPListSet(ctx, c.etcdClient) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
@@ -125,15 +125,21 @@ func (c *clusterMemberRemovalController) attemptToScaleDown(ctx context.Context, | |
| if desiredControlPlaneReplicasCount == 0 { | ||
| return fmt.Errorf("desired control plane replicas count cannot be empty") | ||
| } | ||
| if currentVotingMemberIPListSet.Len() <= desiredControlPlaneReplicasCount { | ||
| klog.V(4).Infof("Ignoring scale-down since the number of etcd voting members (%d) < desired number of control-plane replicas (%d) ", currentVotingMemberIPListSet.Len(), desiredControlPlaneReplicasCount) | ||
|
|
||
| liveVotingMembers, err := c.getAllVotingMembers(ctx) | ||
| if err != nil { | ||
| return fmt.Errorf("could not list etcd members: %w", err) | ||
| } | ||
|
|
||
| if len(liveVotingMembers) < desiredControlPlaneReplicasCount { | ||
| klog.V(2).Infof("Ignoring scale-down since the number of etcd members (%d) < desired number of control-plane replicas (%d) ", len(liveVotingMembers), desiredControlPlaneReplicasCount) | ||
| return nil | ||
| } | ||
|
|
||
| // machines with master role and deletion hook | ||
| memberMachines, err := ceohelpers.CurrentMemberMachinesWithDeletionHooks(c.masterMachineSelector, c.masterMachineLister) | ||
| if err != nil { | ||
| return err | ||
| return fmt.Errorf("could not find master machines with deletion hook: %w", err) | ||
| } | ||
|
|
||
| var votingMembersMachines []*machinev1beta1.Machine | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just to make the below logic consistent, should the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually wanted to do this, but did not want to change the logic completely. I had to use +1 to replace
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had the opposite thought actually. For the default case when we're checking if scale-down is allowed i.e The reason why we use
Not sure what CM here is (assuming machine) but other than the delay @Elbehery Is there a reason why you can't use the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @hasbro17 for your review
the etcd-endpoint CM contains all the voting member ( i.e. non-learner ) whether
I think listing the cluster members is not a load, moreover the recent changes to
Yes this is correct, it takes around a minutes to update the CM .. I can change it back, but may I ask listing the healthy members directly from the etcd cluster is not correct ?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That makes sense, I think we make lots of MemberList calls across the operator already (health probe, etc). The connection to etcd is already open as Mustafa said, so getting the current list of members is not an expensive call. I'm all up for consolidating this listing into a single controller that queries this in the background every 5-10s and all other controllers just feed from that.
I don't have a direct answer for that, but let's go with the case I've added to the quorum guard here: So when you're down to two nodes, we'll never update the CM until you add a new node. I realize now that this is stupid, but we have no other way to selectively tell the static pod controller not to rollout a revision based on that.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
+1
In this case I think listing the members from etcd client makes more sense, or ?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I forgot that we'll have client connections open from the client cache so I guess having this controller make periodic live list calls might not add a significant load. So I guess we can keep the live call here.
Yeah that's not ideal especially since the apiserver is using the endpoints configmap and will end up talking to a non-existent endpoint if we have 2 members and the CM is outdated with 3. But it's a necessary trade-off that we've made given that we don't want quorum loss mid rollout. Let's keep the live list for now, and leave off the consolidation of list calls to a better internal cache than the etcd-endpoints CM as a follow up.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we agree on sticking with the member listing, then I would definitely like to see |
||
|
|
@@ -151,14 +157,19 @@ func (c *clusterMemberRemovalController) attemptToScaleDown(ctx context.Context, | |
| // do not trust data in the cache, compare with the current state | ||
| healthyLiveVotingMembers, err := c.getHealthyVotingMembers(ctx) | ||
| if err != nil { | ||
| return err | ||
| return fmt.Errorf("could not list etcd healthy members: %w", err) | ||
| } | ||
|
|
||
| // scaling down invariant | ||
| if len(healthyLiveVotingMembers) < desiredControlPlaneReplicasCount { | ||
| klog.V(2).Infof("Ignoring scale down since the number of healthy live etcd voting members (%d) < desired number of control-plane replicas (%d) ", len(healthyLiveVotingMembers), desiredControlPlaneReplicasCount) | ||
| minTolerableQuorum, err := etcdcli.MinimumTolerableQuorum(desiredControlPlaneReplicasCount) | ||
| if err != nil { | ||
| klog.V(2).Infof("etcd cluster could not determine minimum quorum required. desiredControlPlaneReplicasCount is %v. minimum quorum required is %v: %w", desiredControlPlaneReplicasCount, minTolerableQuorum, err) | ||
| } | ||
|
|
||
| if len(healthyLiveVotingMembers) < minTolerableQuorum { | ||
| klog.V(2).Infof("ignoring scale down since the number of healthy live etcd members (%d) < minimum required to maintain quorum (%d) ", len(healthyLiveVotingMembers), minTolerableQuorum) | ||
| if time.Now().After(c.lastTimeScaleDownEventWasSent.Add(5 * time.Minute)) { | ||
| recorder.Eventf("ScaleDown", "Ignoring scale down since the number of healthy live etcd voting members (%d) < desired number of control-plane replicas (%d) ", len(healthyLiveVotingMembers), desiredControlPlaneReplicasCount) | ||
| recorder.Eventf("ScaleDown", "Ignoring scale down since the number of healthy live etcd members (%d) < minimum required to maintain quorum (%d) ", len(healthyLiveVotingMembers), minTolerableQuorum) | ||
| c.lastTimeScaleDownEventWasSent = time.Now() | ||
| } | ||
| return nil | ||
|
|
@@ -192,25 +203,19 @@ func (c *clusterMemberRemovalController) attemptToScaleDown(ctx context.Context, | |
| unhealthyMembersURLs = append(unhealthyMembersURLs, unhealthyMember.Name) | ||
| } | ||
| } | ||
| if len(unhealthyVotingMemberMachinesPendingDeletion) > 0 { | ||
| klog.V(4).Infof("found unhealthy voting members with machine pending deletion: %v", unhealthyVotingMemberMachinesPendingDeletion) | ||
| klog.V(4).Infof("unhealthy members found: %v", unhealthyMembersURLs) | ||
| } else { | ||
| return fmt.Errorf("cannot proceed with scaling down, unhealthy voting members found: %v, none are pending deletion", unhealthyMembersURLs) | ||
| if len(unhealthyVotingMemberMachinesPendingDeletion) == 0 { | ||
| klog.V(2).Infof("cannot proceed with scaling down, unhealthy voting etcd members found: %v but none are pending deletion", unhealthyMembersURLs) | ||
| return fmt.Errorf("cannot proceed with scaling down, unhealthy voting etcd members found: %v but none are pending deletion", unhealthyMembersURLs) | ||
| } | ||
| } | ||
|
|
||
| // remove the unhealthy machine pending deletion first | ||
| // if no unhealthy machine pending deletion found, then attempt to scale down the healthy machines pending deletion | ||
| if len(unhealthyVotingMemberMachinesPendingDeletion) > 0 { | ||
| klog.V(2).Infof("found unhealthy voting etcd members with machine pending deletion: %v", unhealthyVotingMemberMachinesPendingDeletion) | ||
| votingMembersMachinesPendingDeletion = append(unhealthyVotingMemberMachinesPendingDeletion, votingMembersMachinesPendingDeletion...) | ||
| } | ||
|
|
||
| liveVotingMembers, err := c.getAllVotingMembers(ctx) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| var allErrs []error | ||
| for _, votingMemberMachinePendingDeletion := range votingMembersMachinesPendingDeletion { | ||
| removed, errs := c.attemptToRemoveMemberFor(ctx, liveVotingMembers, votingMemberMachinePendingDeletion, recorder) | ||
|
|
@@ -290,13 +295,13 @@ func (c *clusterMemberRemovalController) removeMemberWithoutMachine(ctx context. | |
|
|
||
| // attemptToRemoveLearningMember attempts to remove a learning member pending deletion regardless of whether a replacement member has been found | ||
| func (c *clusterMemberRemovalController) attemptToRemoveLearningMember(ctx context.Context, recorder events.Recorder) error { | ||
| currentVotingMemberIPListSet, err := ceohelpers.VotingMemberIPListSet(c.configMapListerForTargetNamespace) | ||
| currentVotingMemberIPListSet, err := ceohelpers.VotingMemberIPListSet(ctx, c.etcdClient) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| memberMachines, err := ceohelpers.CurrentMemberMachinesWithDeletionHooks(c.masterMachineSelector, c.masterMachineLister) | ||
| if err != nil { | ||
| return err | ||
| return fmt.Errorf("could not find master machines with deletion hook: %w", err) | ||
| } | ||
| var learningMachines []*machinev1beta1.Machine | ||
| for memberMachineIP, memberMachine := range ceohelpers.IndexMachinesByNodeInternalIP(memberMachines) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tjungblu is this correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just replicate what the member controller does in:
https://github.com/openshift/cluster-etcd-operator/blob/master/pkg/operator/etcdendpointscontroller/etcdendpointscontroller.go#L132-L136
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch 👍🏽