This repository was archived by the owner on Sep 30, 2020. It is now read-only.
This repository was archived by the owner on Sep 30, 2020. It is now read-only.
ELB IP changes can bring the cluster down #598
Closed
Description
I ran into kubernetes/kubernetes#41916 twice in the last 3 days in my production cluster, with almost 50% of worker nodes transitioning to NotReady
state almost simultaneously in both days, causing a brief downtime in critical services due to Kubernetes default (and agressive) eviction policy for failing nodes.
I just contacted AWS support to validate the hypothesis of the ELB changing IPs at the time of both incidents, and the answer was yes.
My configuration (multi-node control plane with ELB) matches exactly the one in that issue, and probably most kube-aws users are subject to this.
Have anyone else ran into this at some point?