Skip to content

Ingress-nginx-controller v1.8.1 version will cause intermittent network requests to get stuck #10276

@tony-liuliu

Description

@tony-liuliu

Problem phenomenon:
After deploying the latest ingress-nginx-controller, requests to port 80 or 443 of the nginx-controller pod IP address will always be stuck, even if you enter the ingress-nginx-controller container and use curl 127.0.0.1, it will also get stuck Phenomenon, please help me to find out what the problem is.

All requests for non-ingress-nginx-controller services are running normally, including the health check port 10254 of the ingress-nginx-controller service.

Environmental information:
kubernetes version: 1.27.4
OS: CentOS : CentOS Linux release 7.9.2009 (Core)
Linux kernel: Linux dong-k8s-90 4.20.13-1.el7.elrepo.x86_64 #1 SMP Wed Feb 27 10:02:05 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
runtime: containerd://1.7.2

Install tools:

[root@dong-k8s-90 ingress-nginx-controller]# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:20:54Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:14:49Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
[root@dong-k8s-90 ingress-nginx-controller]# kubectl get node -o wide
NAME              STATUS   ROLES           AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
dong-k8s-90   Ready    control-plane   15d   v1.27.4   10.206.60.90   <none>        CentOS Linux 7 (Core)   4.20.13-1.el7.elrepo.x86_64   containerd://1.7.2
dong-k8s-91   Ready    control-plane   15d   v1.27.4   10.206.60.91   <none>        CentOS Linux 7 (Core)   4.20.13-1.el7.elrepo.x86_64   containerd://1.7.2
dong-k8s-92   Ready    control-plane   15d   v1.27.4   10.206.60.92   <none>        CentOS Linux 7 (Core)   4.20.13-1.el7.elrepo.x86_64   containerd://1.7.2
dong-k8s-93   Ready    <none>          15d   v1.27.4   10.206.60.93   <none>        CentOS Linux 7 (Core)   4.20.13-1.el7.elrepo.x86_64   containerd://1.7.2
dong-k8s-95   Ready    <none>          15d   v1.27.4   10.206.60.95   <none>        CentOS Linux 7 (Core)   4.20.13-1.el7.elrepo.x86_64   containerd://1.7.2

CNI: calico-3.26.1 using IPIP mode, Deployment manifest used https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml

How was the ingress-nginx-controller installed:
ingress-nginx-controller version: v1.8.1 Deployment manifest used https://github.com/kubernetes/ingress-nginx/blob/main/deploy/static/provider/baremetal/deploy.yaml

Current State of the controller:

[root@dong-k8s-90 ingress-nginx-controller]# kubectl describe ingressclasses
Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.8.1
Annotations:  <none>
Controller:   k8s.io/ingress-nginx
Events:       <none>
[root@dong-k8s-90 ingress-nginx-controller]# kubectl -n ingress-nginx describe po ingress-nginx-controller-7898b9666d-7zwg6 
Name:             ingress-nginx-controller-7898b9666d-7zwg6
Namespace:        ingress-nginx
Priority:         0
Service Account:  ingress-nginx
Node:             dong-k8s-95/10.206.60.95
Start Time:       Sun, 06 Aug 2023 13:19:51 +0800
Labels:           app.kubernetes.io/component=controller
                  app.kubernetes.io/instance=ingress-nginx
                  app.kubernetes.io/name=ingress-nginx
                  app.kubernetes.io/part-of=ingress-nginx
                  app.kubernetes.io/version=1.8.1
                  pod-template-hash=7898b9666d
Annotations:      cni.projectcalico.org/containerID: 298f9ee44d0a3ff61f7fad9ef8cdd1983a52c1b3b70780a5f7d27a1a6ecd7af4
                  cni.projectcalico.org/podIP: 10.244.158.227/32
                  cni.projectcalico.org/podIPs: 10.244.158.227/32
Status:           Running
IP:               10.244.158.227
IPs:
  IP:           10.244.158.227
Controlled By:  ReplicaSet/ingress-nginx-controller-7898b9666d
Containers:
  controller:
    Container ID:  containerd://09e4e4a164020e089e5fbd144b8d20493a545894b36f980c6c4b9311eb3c04fb
    Image:         docker.sre.com/ingress-nginx/controller:v1.8.1
    Image ID:      docker.sre.com/ingress-nginx/controller@sha256:bd54c330f73b17d0bf19f3ec3832b285d43a4c9fa5fe15f5a7accd3de706b438
    Ports:         80/TCP, 443/TCP, 8443/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      /nginx-ingress-controller
      --election-id=ingress-nginx-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
      --v=4
    State:          Running
      Started:      Sun, 06 Aug 2023 13:19:54 +0800
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      100m
      memory:   90Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       ingress-nginx-controller-7898b9666d-7zwg6 (v1:metadata.name)
      POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqwfp (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-admission
    Optional:    false
  kube-api-access-fqwfp:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age                    From                      Message
  ----     ------       ----                   ----                      -------
  Normal   Scheduled    4m5s                   default-scheduler         Successfully assigned ingress-nginx/ingress-nginx-controller-7898b9666d-7zwg6 to dong-k8s-95
  Warning  FailedMount  3m54s (x2 over 3m55s)  kubelet                   MountVolume.SetUp failed for volume "webhook-cert" : secret "ingress-nginx-admission" not found
  Normal   Pulled       3m52s                  kubelet                   Container image "docker.sre.com/ingress-nginx/controller:v1.8.1" already present on machine
  Normal   Created      3m52s                  kubelet                   Created container controller
  Normal   Started      3m52s                  kubelet                   Started container controller
  Normal   RELOAD       3m51s                  nginx-ingress-controller  NGINX reload triggered due to a change in configuration
[root@dong-k8s-90 ingress-nginx-controller]# kubectl -n ingress-nginx describe svc ingress-nginx-controller
Name:                     ingress-nginx-controller
Namespace:                ingress-nginx
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=ingress-nginx
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/part-of=ingress-nginx
                          app.kubernetes.io/version=1.8.1
Annotations:              <none>
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type:                     NodePort
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.97.230.39
IPs:                      10.97.230.39
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  30882/TCP
Endpoints:                10.244.158.227:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  31057/TCP
Endpoints:                10.244.158.227:443
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

The following is the packet capture information when something goes wrong:

The client initiates a curl request

[root@dong-k8s-90 ingress-nginx-controller]# curl 10.244.32.32 -v
* About to connect() to 10.244.32.32 port 80 (#0)
*Trying 10.244.32.32...
* Connected to 10.244.32.32 (10.244.32.32) port 80 (#0)
> GET /HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.244.32.32
> Accept: */*
>

It has been stuck in this state and has not returned.

ps: Because the pod has been restarted, the IP address seen has changed and the information captured is different.

The request packet captured by the client

[root@dong-k8s-90 ingress-nginx-controller]# tcpdump -nn -n -i tunl0 host 10.244.32.32 and port 80 -e -v
tcpdump: listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes
17:30:45.367189 ip: (tos 0x0, ttl 64, id 2003, offset 0, flags [DF], proto TCP (6), length 60)
     10.244.137.192.19066 > 10.244.32.32.80: Flags [S], cksum 0xbff6 (incorrect -> 0x7284), seq 1217195127, win 64800, options [mss 1440, sackOK, TS val 2772 693908 ecr 0,nop,wscale 7] , length 0
17:30:45.367699 ip: (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
     10.244.32.32.80 > 10.244.137.192.19066: Flags [S.], cksum 0xa821 (correct), seq 402895697, ack 1217195128, win 64260, options [mss 1440,sackOK,TS val 78445676 ecr 2772693908,nop,wscale 7 ], length 0
17:30:45.367810 ip: (tos 0x0, ttl 64, id 2004, offset 0, flags [DF], proto TCP (6), length 52)
     10.244.137.192.19066 > 10.244.32.32.80: Flags [.], cksum 0xbfee (incorrect -> 0xcfe2), ack 1, win 507, options [nop,nop,TS val 2772693909 ecr 78445676] , length 0
17:30:45.367949 ip: (tos 0x0, ttl 64, id 2005, offset 0, flags [DF], proto TCP (6), length 128)
     10.244.137.192.19066 > 10.244.32.32.80: Flags [P.], cksum 0xc03a (incorrect -> 0x806e), seq 1:77, ack 1, win 507, options [nop,nop,TS val 2772693909 ecr 78445676] , length 76: HTTP, length: 76
         GET / HTTP/1.1
         User-Agent: curl/7.29.0
         Host: 10.244.32.32
         Accept: */*

17:30:45.368698 ip: (tos 0x0, ttl 63, id 33244, offset 0, flags [DF], proto TCP (6), length 52)
     10.244.32.32.80 > 10.244.137.192.19066: Flags [.], cksum 0xcf9a (correct), ack 77, win 502, options [nop,nop,TS val 78445677 ecr 2772693909], length 0
17:30:55.449188 ip: (tos 0x0, ttl 64, id 2006, offset 0, flags [DF], proto TCP (6), length 52)
     10.244.137.192.19066 > 10.244.32.32.80: Flags [F.], cksum 0xbfee (incorrect -> 0xa833), seq 77, ack 1, win 507, options [nop,nop,TS val 2772703990 ecr 78445677], length 0
17:30:55.490585 ip: (tos 0x0, ttl 63, id 33245, offset 0, flags [DF], proto TCP (6), length 52)
     10.244.32.32.80 > 10.244.137.192.19066: Flags [.], cksum 0x80ae (correct), ack 78, win 502, options [nop,nop,TS val 78455799 ecr 2772703990], length 0

ingress-nginx-controller container network capture

[root@dong-k8s-93 ~]# ps -ef|grep nginx
101 15699 15227 0 16:51 ? 00:00:00 /usr/bin/dumb-init -- /nginx-ingress-controller --election-id=ingress-nginx-leader --controller-class=k8s.io/ ingress-nginx --ingress-class=nginx --configmap=ingress-nginx/ingress-nginx-controller --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating -webhook-key=/usr/local/certificates/key
101 15833 15699 0 16:51 ? 00:00:03 /nginx-ingress-controller --election-id=ingress-nginx-leader --controller-class=k8s.io/ingress-nginx --ingress-class=nginx --configmap=ingress-nginx/ingress-nginx-controller --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating-webhook-key=/usr/local/ certificates/key
101 16546 15833 0 16:51 ? 00:00:00 nginx: master process /usr/bin/nginx -c /etc/nginx/nginx.conf
[root@dong-k8s-93 ~]# nsenter -n -t 15699
[root@dong-k8s-93 ~]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1480
         inet 10.244.32.32 netmask 255.255.255.255 broadcast 0.0.0.0
         inet6 fe80::4c68:83ff:fe5d:687e prefixlen 64 scopeid 0x20<link>
         ether 4e:68:83:5d:68:7e txqueuelen 1000 (Ethernet)
         RX packets 12056 bytes 3037628 (2.8 MiB)
         RX errors 0 dropped 0 overruns 0 frame 0
         TX packets 10075 bytes 1263907 (1.2 MiB)
         TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
         inet 127.0.0.1 netmask 255.0.0.0
         inet6 ::1 prefixlen 128 scopeid 0x10<host>
         loop txqueuelen 1000 (Local Loopback)
         RX packets 15365 bytes 1243138 (1.1 MiB)
         RX errors 0 dropped 0 overruns 0 frame 0
         TX packets 15365 bytes 1243138 (1.1 MiB)
         TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@dong-k8s-93 ~]# tcpdump -nn -n port 80 -e -v
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:30:52.367684 ee:ee:ee:ee:ee:ee > 4e:68:83:5d:68:7e, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 63, id 2003, offset 0, flags [DF], proto TCP (6), length 60)
    10.244.137.192.19066 > 10.244.32.32.80: Flags [S], cksum 0x7284 (correct), seq 1217195127, win 64800, options [mss 1440,sackOK,TS val 2772693908 ecr 0,nop,wscale 7], length 0
17:30:52.367761 4e:68:83:5d:68:7e > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.244.32.32.80 > 10.244.137.192.19066: Flags [S.], cksum 0xbff6 (incorrect -> 0xa821), seq 402895697, ack 1217195128, win 64260, options [mss 1440,sackOK,TS val 78445676 ecr 2772693908,nop,wscale 7], length 0
17:30:52.368114 ee:ee:ee:ee:ee:ee > 4e:68:83:5d:68:7e, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 63, id 2004, offset 0, flags [DF], proto TCP (6), length 52)
    10.244.137.192.19066 > 10.244.32.32.80: Flags [.], cksum 0xcfe2 (correct), ack 1, win 507, options [nop,nop,TS val 2772693909 ecr 78445676], length 0
17:30:52.368615 ee:ee:ee:ee:ee:ee > 4e:68:83:5d:68:7e, ethertype IPv4 (0x0800), length 142: (tos 0x0, ttl 63, id 2005, offset 0, flags [DF], proto TCP (6), length 128)
    10.244.137.192.19066 > 10.244.32.32.80: Flags [P.], cksum 0x806e (correct), seq 1:77, ack 1, win 507, options [nop,nop,TS val 2772693909 ecr 78445676], length 76: HTTP, length: 76
        GET / HTTP/1.1
        User-Agent: curl/7.29.0
        Host: 10.244.32.32
        Accept: */*
17:30:52.368641 4e:68:83:5d:68:7e > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 64, id 33244, offset 0, flags [DF], proto TCP (6), length 52)
    10.244.32.32.80 > 10.244.137.192.19066: Flags [.], cksum 0xbfee (incorrect -> 0xcf9a), ack 77, win 502, options [nop,nop,TS val 78445677 ecr 2772693909], length 0
17:31:02.449630 ee:ee:ee:ee:ee:ee > 4e:68:83:5d:68:7e, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 63, id 2006, offset 0, flags [DF], proto TCP (6), length 52)
    10.244.137.192.19066 > 10.244.32.32.80: Flags [F.], cksum 0xa833 (correct), seq 77, ack 1, win 507, options [nop,nop,TS val 2772703990 ecr 78445677], length 0
17:31:02.490541 4e:68:83:5d:68:7e > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 64, id 33245, offset 0, flags [DF], proto TCP (6), length 52)
    10.244.32.32.80 > 10.244.137.192.19066: Flags [.], cksum 0xbfee (incorrect -> 0x80ae), ack 78, win 502, options [nop,nop,TS val 78455799 ecr 2772703990], length 0

It will cause the client to be stuck all the time. This frequency is very high Please help me to find out what is causing the problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.needs-kindIndicates a PR lacks a `kind/foo` label and requires one.needs-priorityneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions