-
Notifications
You must be signed in to change notification settings - Fork 766
Description
Hi guys, we are having the following issue, currently. What is the suggested solution for this case?
Describe the issue
Flagger doesn't progress the canary while the number of running pods is below expected -> OK.
But during peak times (and also during regular canary traffic shift), a deployment with an enabled HPA can have an aggressive upscale policy in place, which constantly may change the number of expected pods due to increased traffic.
This makes flagger wait for a long time and the progressDeadlineSeconds
can be triggered and fail the deployment.
To Reproduce
Start a canary deployment and keep increasing HPA number of expected pods, so flagger will wait forever until this is stabilised, eventually triggering the deadline failure.
Expected behavior
Once the pods with new version are ready, flagger should not count subsequent upscales as progressDeadlineSeconds
and should just wait until pods are ready without triggering a rollback (as this was not caused by the new version).
Additional context
- Increasing the deadline can lead to slower reactions in case of real problems during pod startup
- Flagger version: 1.6.4
- Kubernetes version: 1.18
- Service Mesh provider: Linkerd
- Ingress provider: -
Thanks