Skip to content

External Authentication failure abruptly terminates the request #13450

Open
@richbern

Description

@richbern

What happened:

Started by upgrading from ingress-nginx v 1.9.4 to 1.12.1.
Connections that used to fail authentication with an external authentication provider (using the global-auth-url annotation) are now being terminated unexpectedly.

What you expected to happen:

Instead of responding with a valid 401 response, the TCP connection is being terminated with a RST packet.

NGINX Ingress controller version (exec into the pod and run /nginx-ingress-controller --version):
Noticed on upgrade to 1.12.1. Walked back and can reproduce the issue back to 1.10.1. (Note, using the chroot container, so can't test in 1.10.0).

Kubernetes version (use kubectl version):
Client Version: v1.30.7
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.7

Environment:

  • Cloud provider or hardware configuration: azure aks

  • OS (e.g. from /etc/os-release): Alpine Linux v3.21

  • Kernel (e.g. uname -a): Linux 5.15.0-1088-azure #⁠97-Ubuntu SMP Tue Apr 22 18:58:00 UTC 2025 x86_64 Linux

  • Install tools:

    • Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
  • Basic cluster related info:

    • kubectl version
    • kubectl get nodes -o wide
  • How was the ingress-nginx-controller installed: Deployed using a deployment yaml and not the helm chart; kubectl apply -f deployment.yaml

    • If helm was used then please show output of helm ls -A | grep -i ingress
    • If helm was used then please show output of helm -n <ingresscontrollernamespace> get values <helmreleasename>
    • If helm was not used, then copy/paste the complete precise command used to install the controller, along with the flags and options used
    • if you have more than one instance of the ingress-nginx-controller installed in the same cluster, please provide details for all the instances
  • Current State of the controller:

    • kubectl describe ingressclasses
    • kubectl -n <ingresscontrollernamespace> get all -A -o wide
    • kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
    • kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
  • Current state of ingress object, if applicable:

    • kubectl -n <appnamespace> get all,ing -o wide
    • kubectl -n <appnamespace> describe ing <ingressname>
    • If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag
  • Others:

    • Any other related information like ;
      • copy/paste of the snippet (if applicable)
      • kubectl describe ... of any custom configmap(s) created and in use
      • Any other related information that may help

How to reproduce this issue:

From what I can tell, this is due to our configuration to have global-auth-url set to communicate with a different process (not sure if it matters, but we're using a unix data socket to communicate with a sidecar).
That communication is happening over http1.1, and the authn server responds with a 401.

Make an https connection using http1.1 through the edge that will go through that authn server and get the 401.
The client sees an EOF.
 
Anything else we need to know:
The external authentication tool is installed as a sidecar in the pod and is configured to communicate over UDS using HTTP1.

Upon initial spinup, making an http1.1 request to the server that would fail authn is abruptly terminated. Http2 connections work as expected. For our case, it's as simple as hitting the root domain url.

If I cause the ingress to reload it's configuration, then some percentage of the http1.1 traffic will work without issue, but occasional requests will fail with the abrupt termination. In curl, I'm getting an error 52: empty reply from the server. Bombardier-ing the url causes a number of connections to be reset by peer (mostly during reads, occasionally during writes) or just a large number of EOF reports.

I tried disabling HTTP2 on the ingress controller and saw no change in behavior, other than http2 requests getting downgraded to http1.1 (and demonstrating the early termination issue).

Also of note, when these requests are terminated early for the client, it appears the server keeps the connection marked as Active, as the metrics show a constant increase in active connections.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/supportCategorizes issue or PR as a support question.needs-priorityneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions