Description
Hey folks, I'm having a fair bit of trouble getting 0 downtime deployments to work. The issue:
After a new pod is passes its readiness check, the ALB target group places the Pod's IP in an "initial" state which can last for a couple of seconds.
However since the new pod is ready as far as K8s is concerned, it begins to terminate an old pod, which immediately enters a "draining" state on the target group. At this point there are no pods available to answer requests.
To some extent this can be handled by simply increasing the number of pods. This isn't really a solution though, it just lowers the probability that the rolling deployment outpaces the ALB's ability to keep up. If the AWS API were to undergo any kind of delay or outage, the deployment could complete without any live pods actually registered in the target group.
Is there any known way to require a pod show up as "healthy" in the target group before K8s considers it alive?