-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Custom Pod Status for PodReadinessGate to Block Premature Pod Termination #905
Comments
I'm working on this. |
I won't have time to continue working on this in the next two weeks, but I have pushed an initial implementation here: https://github.com/devkid/aws-alb-ingress-controller/tree/feature/pod-readiness-gate (it compiles and has documentation, but I didn't test it yet nor did I adjust the unit tests). It works by reconciling the pod conditions in the same step where it reconciles the ALB target group state. I think it wouldn't work in its current state because right now only the If someone feels like picking up from where I left, feel free to do so. Otherwise I would probably continue in ~2-3 weeks. |
Hey @devkid -- just want to check-in to see if you have had a chance to get back to this change at all. This is impacting us and we're open to trying to help with any PRs that need to be done if you need help. Thanks! |
Hi @tecnobrat. I didn't find the time to continue with this significantly, but the functionality should be usable as far as I can tell. I created a draft PR here: #955 and added some TODOs. The controller itself is building, but the target group unit test is broken because we need to pass in another parameter to the targets controller (k8s client). Also it's still missing additional unit tests for the new functionality. I'll be gone for another 2 weeks now. Maybe you can pick this up if you want to. |
Hi, |
Well, the functionality is there, I'm still waiting for reviews. You can try the code from my branch if you want to, that would be helpful feedback already. |
Btw a workaround would be to have at least 2+n replicas and |
@devkid Would a pod be registered on the ALB before it's ready? I had a similar hypothesis a while ago and I tried with 6 replicas, 0 |
@devkid there are still some issues with your approach. I'm trying to come up with a follow-up PR to yours so we can maybe share the work on this. Especially the endpoint and reconciliation part is very much open for discussion: One question @devkid: How do we handle ALB TG changes that kubernetes does not receive immediately (where no reconcile will be triggered). Ingress-Gce uses a poller to query their NEGs, should we follow a similar strategy? |
We are having a similar issue as well. When pods are terminated the ALB Ingress Controller rapidly updates the ALB target group however the changes take a few seconds to take effect which causes a short period where clients may get 504s. We are using IP mode meaning each Pod is in the target group since we require the real ALB source IP not the source IP kube-proxy uses in order to use NetworkPolicies. To workaround this we are taking the approach of running kube-proxy without source NAT (https://kubernetes.io/docs/tutorials/services/source-ip/) though this has draw backs as well but masks the issue to a degree. What I would really like to see is one of two things:
|
I would be willing to continue @devkid s and my work on this PR. Maybe in collaboration with someone from AWS that could give us insight on the Polling-thing i was talking about earlier. /cc @andrewtandoc |
We are hitting this as well and it's a blocker for production workloads. @devkid @nirnanaaa I was going to review and finish this PR also, so let's explore the problem space and collaborate on a design that we can implement. It is actually very similar to what Google needed to solve for their loadbalancers (23:00 in this video https://youtu.be/Vw9GmSeomFg) |
@iverberk you're absolutely right. We were able to solve most of the challenges by taking a look at ingress-gce (basically this piece is the relevant one: https://github.com/kubernetes/ingress-gce/tree/5b2fd1df8634ca02aaa10072d7769a2544d39cb5/pkg/neg/readiness) Except for the poller everything works just fine (didn't figure out a way, to make that work). Maybe you have something to overcome that issue, already? See @devkid s PR and my changes here: |
Not necessarily starting of fresh, it seems the implementation is almost there, just didn't want to duplicate efforts. I didn't know about that ingress-gce code, but will check it out. I guess you suggest polling to detect changes to target groups (targets becoming unhealthy etc.) that happen outside the aws-alb-ingress-controller purview and need reconciling with Pod readiness gates? |
@nirnanaaa @iverberk do we maybe want to meet to discuss the next steps? I would send out an invite tomorrow from my company email. |
We were able to slow down the termination of pods to let the ALB catch up thus preventing 504s by using lifecycle prestop hook
|
@nirnanaaa try maybe reaching out to @M00nF1sh? They are on the k8s slack as well. |
will do, thanks! |
@nirnanaaa @devkid were you able to make some progress on this? I think we need to pick this up as the ingress controller is unable to do zero-downtime deployments in the current state. |
no progress, yet. Also was gone on vacation and had some internal work to do. I was talking to @M00nF1sh to find out if there is something planned in terms of external-resource-watch and he said that there's a similar issue we're facing with the pull request when dealing with security group attachments. I asked him if it would make sense to work on this and he said (quote):
not sure what that refers to; creating the controller on our own or waiting until the controller is there. |
I will setup a new PR, to propose such a controller. |
Hey, we have ran into the issue described in this issue and have also tested how we should handle this when we are rotating our AMIs which require rotating all of our EC2 instances. We rely on cloudformation update policy to rotate our instances in an autoscaling group. This presents an additional factor that the ALB controller is getting rescheduled at the same time other pods are. We have currently found setting the prestop hook sleep to 60secs mostly avoids 5xx errors. |
Me, @nirnanaaa and @iverberk were trying to implement something outside of the ALB (code is here: https://github.com/nirnanaaa/kube-readiness) to test the theories and this is what we found out:
After doing all of this we were able to make it work eventually with consistent behaviour and achieving zero-downtime rolling update for our end-2-end test. Unsolved issues:
Conclusion: Alternative: |
@nirnanaaa et al.: were you able to get a new PR set up? It seems like this issue has stalled and it's a pretty large blocker for anyone wanting to use IP target mode in a production environment. Thanks! |
@stevenpall unfortunately not. I am at this point uncertain where to continue. The repo, that @arminc mentioned is in theory working just fine. With the controller and a prestop hook it's working quite well. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Hi all, I have updated my pull request: #955 It now starts a go routine per target group to reconcile the condition status (it quits when all targets are healthy – see the code for more details). Testing would be very welcome. |
Presently when using IP target mode on the ALB, it's difficult to guarantee zero downtime rolling updates. This is because there is additional state in the ALB Target Group which is not propagated back to Kubernetes:
initial
,draining
, andhealthy
.This was discussed in: #660 & #814
This issue is to discuss adding support to the ALB controller for setting a custom Pod Status which mirrors the ALB TargetGroup. This can then be used with
PodReadinessGate
andMaxUnavailable
/PodDisruptionBudget
to ensure a rolling update does not proceed faster than the new Pods becoming healthy on the ALB.The text was updated successfully, but these errors were encountered: