Autoscaler interfering in meltdown scenario solution

**How to categorize this issue?**

/area performance
/kind bug
/priority 2

**What happened**:
Autoscaler's [`fixNodeGroupSize`](https://github.com/gardener/autoscaler/blob/1ec082de7487861da716e9867fe0a08c9a230437/cluster-autoscaler/core/static_autoscaler.go#L593) logic interferes with meltdown logic where we remove only maxReplacement machines per machinedeployment, and it removes the other `Unknown` machines as well.

**What you expected to happen**:
Autoscaler even on taking decision of [`DecreaseTargetSize`](https://github.com/gardener/autoscaler/blob/1ec082de7487861da716e9867fe0a08c9a230437/cluster-autoscaler/cloudprovider/mcm/mcm_cloud_provider.go#L314) should not be able to remove `Unknown` machines, because the node object is actually present for them.

**How to reproduce it (as minimally and precisely as possible)**:
- Create a machinedeployment with 2 replicas (its assumed autoscaler is enabled for the cluster)
- block all traffic to/from the zone machinedeployment is for
- with default maxReplacement 1 node will stay in Pending state
- after around 20 min , the Unknown machine would be deleted when autoscaler fixes the node grp size by reducing machinedeployment replicas to 1

**Anything else we need to know?**:
This is happening because the way machineSet prioritizes machine while deletion based on their status https://github.com/gardener/machine-controller-manager/blob/d7e3c5dffeb33abe2c30b435075fb050301da4fa/pkg/controller/controller_utils.go#L769-L776

*We need to look into any other implication of prioritizing `Pending` machine over `Unknown` machines for solution.

**Environment**:

- Kubernetes version (use `kubectl version`):
- Cloud provider or hardware configuration:
- Others:
CA version 1.23.1

	m := map[v1alpha1.MachinePhase]int{
	v1alpha1.MachineTerminating: 0,
	v1alpha1.MachineFailed: 1,
	v1alpha1.MachineCrashLoopBackOff: 2,
	v1alpha1.MachineUnknown: 3,
	v1alpha1.MachinePending: 4,
	v1alpha1.MachineAvailable: 5,
	v1alpha1.MachineRunning: 6,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoscaler interfering in meltdown scenario solution #741

himanshu-kun
openedon Aug 16, 2022

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Autoscaler interfering in meltdown scenario solution #741

Description

himanshu-kunopenedon Aug 16, 2022

Activity

Metadata