Skip to content

Ingress controller reconciles all endpoints sequentially causing delay that results in 502s #1103

Closed
@krunalnsoni

Description

When a pod is deleted from k8s, ingress controller receives endpoint change event and reconciles all endpoints sequentially. As number of ingress resources are added, the delay for reconciliation with k8s cluster gets longer. This is very easy to reproduce because endpoints are reconciled in alphabetical order. Updating endpoint for last service will take the longest.

In our case, we have 8 ingress resources. It takes at least 15 seconds for AWS ingress controller to reconcile with k8s cluster. Once pod has been terminated, this results in 502.

The workaround is to add preStop delay to the pod. We have added 30 seconds delay to resolve this issue. However, this is not scalable as more ingress resources are being added to the cluster.

Controller should only reconcile endpoint being changed to improve efficiency. If not, endpoints could be reconciled in parallel.

There is already a TODO for this.
https://github.com/kubernetes-sigs/aws-alb-ingress-controller/blob/31aa413a6b63ebc4e30b400f94613fda485c0a2b/internal/ingress/controller/handlers/endpoints.go#L52

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions