-
Notifications
You must be signed in to change notification settings - Fork 1.4k
🐛 KCP should check if it has machines being deleted before proceeding with scale up/down #2958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 KCP should check if it has machines being deleted before proceeding with scale up/down #2958
Conversation
) | ||
).Filter(machinefilters.Not(machinefilters.HasDeletionTimestamp)) | ||
} | ||
|
||
return c.Machines.Filter( | ||
machinefilters.Not(machinefilters.MatchesConfigurationHash(c.SpecHash())), | ||
machinefilters.Not(machinefilters.HasDeletionTimestamp), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be added in scaleUpControlPlane() similar to scaleDownControlPlane():
// Wait for any delete in progress to complete before deleting another Machine
if controlPlane.HasDeletingMachine() {
return ctrl.Result{}, &capierrors.RequeueAfterError{RequeueAfter: deleteRequeueAfter}
}
I prefer to do it in MachinesNeedingUpgrade(), because Machines that are about to get deleted should not be included in that list logically.
BTW, the second issue is happening with 3-node control plane upgrades too, it was creating one additional machine and then deleting it. |
is_last_control_plane_members test should be updated |
9aac926
to
4a0e42e
Compare
I think I need help understanding the chronology here. With a one node cluster the way it's supposed to work is:
There must be something untrue about this for what you're seeing to be happening. All I can think of is that there's some mismatch between For the other issue, I don't think we want to just drop the |
What we've learned:
|
4a0e42e
to
67af527
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this change correctly in the context of the existing state of the code, I believe that we have a potential bug related to the following operations happening in quick succession that may cause multiple scale up events to happen in quick succession without waiting for them to finish becoming ready.
- Initiate an upgrade on a KCP resource with a single replica (either by setting UpgradeAfter or updating the Version)
- Scale the kcp to 3 replicas.
Unless I'm missing something, it looks like there is nothing that is preventing upgrade (or scale up itself) from performing multiple scale up operations without waiting for the previous scale up operation to fully complete first.
I also wonder if we have a potential race condition between joining a new control plane node to the cluster and the behavior of ReconcileEtcdMembers()
. Since we are using static pods, there are no guarantees that the Node resource is created in the apiserver before the etcd Pod is up and running on the host. It looks like if we call ReconcileEtcdMembers()
at just the right time during the join process we could effectively remove the etcd membership for the etcd member we just added to the cluster as part of the control plane join. It doesn't look like either the ControlPlaneIsHealthy or EtcdIsHealthy checks would prevent this, since they are only looking at Nodes on the remote cluster to determine the cluster is healthy.
As an example of the problem with not having MachineDeletion check in Upgrade is:
This scenario happens in a HA cluster as well. As a result, extra nodes are being created during upgrade and then scale down happens. Also, ForwardEtcdLeadership() is panicking because we don't have a logic to wait for the second upgraded node to become ready before calling it. |
Having a HasDeletingMachine () check in Upgrade actually prevents this because that means it needs to wait until MachinesNeedingUpgrade() has no other machines in deletion process. This ensures if ScaleUp proceeds, it is because there is a machine with old version.
ReconcileEtcdMembers() only removes members if they don't have an underlying machine which should not be the case here. It is basically a cleanup if machine is deleted without successfully cleaning up the node. |
/test pull-cluster-api-capd-e2e |
Does it? From what I'm seeing it looks like it only checks Nodes against the etcd membership here: cluster-api/controlplane/kubeadm/internal/workload_cluster_etcd.go Lines 135 to 181 in fff092e
|
In the scenario I mentioned (1 replica control plane updated to 3 immediately after initiating the upgrade) I wouldn't expect there to be any machines that have a deletiontimestamp set yet since scaleDown is only called when the Node count > the replica count. As an aside, this also looks to be using Nodes and not Machines here. |
I read through the discussion and noted similarities with my tests in metal3-dev-env.
On the machine level it is not what I expected. I have two running machines, one with old and one with new k8s version. A third machine pops up and since no resources are available, it is hanging.
I would not expect to see a third machine provisioning, instead de-provisioning of the old running machine. |
/milestone v0.3.4 |
@Jaakko-Os do you have some time to try the fix that's in this PR by building a custom image? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
@sedefsavas the only thing left to do in this PR is to get the check at the bottom of reconcileHealth
as a temporary workaround for what we discussed before + document the reasons around code
did I miss anything else? @detiber @benmoss @fabriziopandini
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sedefsavas, vincepri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
67af527
to
8d43364
Compare
/retitle 🐛 KCP should check if it has machines being deleted before proceeding with scale up/down |
342d83c
to
4179882
Compare
4179882
to
4d50be2
Compare
4d50be2
to
8bbdd95
Compare
/lgtm |
@vincepri I'll run my test again |
@Jaakko-Os We released v0.3.4 which includes this fix, would love to know if that works for you! |
@vincepri I repeated my test and now the upgrade was successful! |
@vincepri I have also successfully upgraded kubernetes version from 1.18.0 to 1.18.2. |
What this PR does / why we need it:
This PR fixes two different issues I observe during KCP single control plane upgrade.
machine_controller was not allowing machine deletion thinking that it is the last control plane machine.
Not having a check for machines that are in deletion process during MachinesNeedingUpgrade() was causing a second machine with the new version to come up. It was creating a machine with new version, adding a DeletionTimestamp for the old node, and at this point when it comes to reconcile, it thinks there is still 1 machine to be upgraded (the one with the deletion timestamp) and it scales up one more time.
Besides of creating a second machine, this was causing ForwardLeaderElection to panic from time to time.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #2915