Introduce machine termination queue to handle deletions #964

takoverflow · 2025-01-29T06:12:36Z

What this PR does / why we need it:
Adds a new queue to handle machine objects scheduled for
deletion to unblock machine creation/update requests.

Which issue(s) this PR fixes:
Fixes #943

Special notes for your reviewer:

Release note:

A new termination queue to handle machines scheduled for deletion introduced to separate creation requests from deletion

gardener-robot · 2025-01-29T06:12:45Z

@takoverflow Thank you for your contribution.

takoverflow · 2025-01-31T05:16:51Z

Tested scenario

Scale-up/Scale-down of a Machine Deployment

By causing a scale-down in a deployment having a large replica count, numerous machines simultaneously go to the terminating state where they clog up the machine queue preventing any new machine creations (which is simulated by causing a scale-up in a different Machine Deployment)

kubectl get mcd
NAME                              READY   DESIRED   UP-TO-DATE   AVAILABLE   AGE
shoot--i752152--scale-test-a-z1                                              0s
shoot--i752152--scale-test-b-z1           1                                  0s

Increase the replica count of one of the machine deployments (a).

kubectl scale mcd shoot--i752152--scale-test-a-z1 --replicas=100
machinedeployment.machine.sapcloud.io/shoot--i752152--scale-test-a-z1 scaled

Check that all the machines are running, then scale-down the Machine Deployment (a) to simulate deletion of numerous machines at once.

kubectl get machine | grep Running | wc -l
101
kubectl scale mcd shoot--i752152--scale-test-a-z1 --replicas=0
machinedeployment.machine.sapcloud.io/shoot--i752152--scale-test-a-z1 scaled

Scale up the other deployment (b) to have pending machine creation requests with a bit of a wait to allow the deletion requests to all process so that the creation requests can't seep in between.

sleep 120
kubectl scale mcd shoot--i752152--scale-test-b-z1 --replicas=100
machinedeployment.machine.sapcloud.io/shoot--i752152--scale-test-b-z1 scaled

Wait for some time and then check status of the number of machines that are running. The deleted machine requests were artificially delayed in the virtual provider to simulate real time-consuming deletion process.

sleep 180
kubectl get machine | grep 'a-z1.*Terminating' | wc -l
100
kubectl get machine | grep 'b-z1.*Running' | wc -l
1

Due to the terminating machines hogging all the workers in the machine queue, creation requests can't go through. The same scenario when replicated with the new machineTerminationQueue allows for separating the time-consuming deletion flow into a different queue thus allowing for creation requests to not have to wait for longer duration.

sleep 180
kubectl get machine | grep 'a-z1.*Terminating' | wc -l
100
kubectl get machine | grep 'b-z1.*Running' | wc -l
100

takoverflow · 2025-02-07T05:51:30Z

Stress Test

Delete 4 machine deployments and Scale-up a new one

By causing deletions in deployments having a large replica count, numerous machines simultaneously go to the terminating state where they clog up the machine queue preventing any new machine creations (which is simulated by causing a scale-up in a different Machine Deployment)

+ kubectl get mcd
NAME                              READY   DESIRED   UP-TO-DATE   AVAILABLE   AGE
shoot--i752152--scale-test-a-z1                                              1s
shoot--i752152--scale-test-b-z1                                              1s
shoot--i752152--scale-test-c-z1                                              1s
shoot--i752152--scale-test-d-z1           1                                  1s
shoot--i752152--scale-test-e-z1                                              0s

Increase the replica count of four of the machine deployments (a, b, c and d).

+ kubectl scale mcd shoot--i752152--scale-test-a-z1 --replicas=100
machinedeployment.machine.sapcloud.io/shoot--i752152--scale-test-a-z1 scaled
+ kubectl scale mcd shoot--i752152--scale-test-b-z1 --replicas=100
machinedeployment.machine.sapcloud.io/shoot--i752152--scale-test-b-z1 scaled
+ kubectl scale mcd shoot--i752152--scale-test-c-z1 --replicas=100
machinedeployment.machine.sapcloud.io/shoot--i752152--scale-test-c-z1 scaled
+ kubectl scale mcd shoot--i752152--scale-test-d-z1 --replicas=100
machinedeployment.machine.sapcloud.io/shoot--i752152--scale-test-d-z1 scaled

Schedule a pod on each machine, wait for some time and then delete all four Machine Deployments.

+ kubectl delete mcd shoot--i752152--scale-test-a-z1
machinedeployment.machine.sapcloud.io "shoot--i752152--scale-test-a-z1" deleted
+ kubectl delete mcd shoot--i752152--scale-test-b-z1
machinedeployment.machine.sapcloud.io "shoot--i752152--scale-test-b-z1" deleted
+ kubectl delete mcd shoot--i752152--scale-test-c-z1
machinedeployment.machine.sapcloud.io "shoot--i752152--scale-test-c-z1" deleted
+ kubectl delete mcd shoot--i752152--scale-test-d-z1
machinedeployment.machine.sapcloud.io "shoot--i752152--scale-test-d-z1" deleted

Scale up the remaining deployment (e) to have pending machine creation requests with a bit of a wait to allow the deletion requests to all process so that the creation requests can't seep in between.

+ sleep 120
+ kubectl scale mcd shoot--i752152--scale-test-e-z1 --replicas=100
machinedeployment.machine.sapcloud.io/shoot--i752152--scale-test-e-z1 scaled

Wait for some time and then check status of the number of machines that are running. The deleted machine requests were artificially delayed (by 10m each) in the virtual provider to simulate real time-consuming deletion process.

+ sleep 180
+ kubectl get machine | grep 'Terminating' | wc -l
400
+ kubectl get machine | grep 'e-z1.*Running' | wc -l
0

Looking at the logs for the timestamps when the machines were scheduled for deletion and when the first machine from the newly scaled deployment(e) started running:

# Get the first machine that goes to terminating
+ TERM=$(grep -Pi "Machine.*status updated to terminating" old.log | head -n 1 | sed 's|.*"\(.*\)".*|\1|')
shoot--i752152--scale-test-a-z1-6f964-2z6hs

# Check the timestamp for it to go to terminating state
+ grep -Pi 'machine "shoot--i752152--scale-test-a-z1-6f964-2z6hs" status updated to terminating' old.log
I0204 21:45:19.924412   35563 machine_util.go:936] Machine "shoot--i752152--scale-test-a-z1-6f964-2z6hs" status updated to terminating

# Get the first new machine that's created
+ NEW=$(grep -i "adding machine object to queue" old.log | grep 'e-z1' | head -n 1 | sed 's|.*".*/\(.*\)".*|\1|')
shoot--i752152--scale-test-e-z1-d5fc5-crrfj

# Check when it gets to "Running" state
+ grep -Pi 'Start for "shoot--i752152--scale-test-e-z1-d5fc5-crrfj" with phase:"Running"' old.log | head -n 1
I0204 23:23:44.720681   35563 machine.go:116] reconcileClusterMachine: Start for "shoot--i752152--scale-test-e-z1-d5fc5-crrfj" with phase:"Running", description:"Machine shoot--i752152--scale-test-e-z1-d5fc5-crrfj successfully joined the cluster"

So the first machine that was scheduled for deletion was at 21:45:19 and the first machine that went to "Running" state for the newly scaled deployment was at 23:23:44. Due to the terminating machines hogging all the workers in the machine queue, creation requests can't go through and it takes almost 1h40m for successful creation of new machines.

After the change

The same scenario when replicated with the new machineTerminationQueue allows for separating the time-consuming deletion flow into a different queue thus allowing for creation requests to not have to wait for longer duration.

+ sleep 180
+ kubectl get machine | grep 'Terminating' | wc -l
400
+ kubectl get machine | grep 'e-z1.*Running' | wc -l
100

Again, looking at the timestamps

# Get the first machine that goes to terminating
+ TERM=$(grep -i "adding machine object to termination queue" new.log | head -n 1 | sed 's|.*".*/\(.*\)".*|\1|')
shoot--i752152--scale-test-b-z1-64874-b6hjs

# Check the timestamp for it to go to terminating state
+ grep -Pi 'shoot--i752152--scale-test-b-z1-64874-b6hjs.*reason: handling terminating machine object UPDATE event' new.log
I0204 23:46:33.287374   49438 machine.go:92] Adding machine object to termination queue "shoot--i752152--scale-test/shoot--i752152--scale-test-b-z1-64874-b6hjs", reason: handling terminating machine object UPDATE event

# Get the first new machine that's created
+ NEW=$(grep -i "adding machine object to queue" new.log | grep 'e-z1' | head -n 1 | sed 's|.*".*/\(.*\)".*|\1|')
shoot--i752152--scale-test-e-z1-d5fc5-v8dhj

# Check when it gets to "Running" state
+ grep -Pi 'Start for "shoot--i752152--scale-test-e-z1-d5fc5-v8dhj" with phase:"Running"' new.log | head -n 1
I0204 23:51:08.974343   49438 machine.go:140] reconcileClusterMachine: Start for "shoot--i752152--scale-test-e-z1-d5fc5-v8dhj" with phase:"Running", description:"Machine shoot--i752152--scale-test-e-z1-d5fc5-v8dhj successfully joined the cluster"

The first machine scheduled for deletion was at 23:46:33 and the first "Running" machine from the new deployment is at 23:51:08 that's a difference of ~5m and removing 2m from the sleep 120 invocation earlier, it takes close to 3m for the new creation requests to go through.