Skip to content

Autoscaler interfering in meltdown scenario solution #741

Open

Description

How to categorize this issue?

/area performance
/kind bug
/priority 2

What happened:
Autoscaler's fixNodeGroupSize logic interferes with meltdown logic where we remove only maxReplacement machines per machinedeployment, and it removes the other Unknown machines as well.

What you expected to happen:
Autoscaler even on taking decision of DecreaseTargetSize should not be able to remove Unknown machines, because the node object is actually present for them.

How to reproduce it (as minimally and precisely as possible):

  • Create a machinedeployment with 2 replicas (its assumed autoscaler is enabled for the cluster)
  • block all traffic to/from the zone machinedeployment is for
  • with default maxReplacement 1 node will stay in Pending state
  • after around 20 min , the Unknown machine would be deleted when autoscaler fixes the node grp size by reducing machinedeployment replicas to 1

Anything else we need to know?:
This is happening because the way machineSet prioritizes machine while deletion based on their status

m := map[v1alpha1.MachinePhase]int{
v1alpha1.MachineTerminating: 0,
v1alpha1.MachineFailed: 1,
v1alpha1.MachineCrashLoopBackOff: 2,
v1alpha1.MachineUnknown: 3,
v1alpha1.MachinePending: 4,
v1alpha1.MachineAvailable: 5,
v1alpha1.MachineRunning: 6,

*We need to look into any other implication of prioritizing Pending machine over Unknown machines for solution.

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Others:
    CA version 1.23.1

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

area/disaster-recoveryDisaster recovery relatedarea/high-availabilityHigh availability relatedarea/robustnessRobustness, reliability, resilience relatedeffort/2wEffort for issue is around 2 weekskind/bugBugkind/designlifecycle/rottenNobody worked on this for 12 months (final aging stage)needs/planningNeeds (more) planning with other MCM maintainerspriority/2Priority (lower number equals higher priority)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions