-
Notifications
You must be signed in to change notification settings - Fork 8.5k
Description
Problem phenomenon:
After deploying the latest ingress-nginx-controller, requests to port 80 or 443 of the nginx-controller pod IP address will always be stuck, even if you enter the ingress-nginx-controller container and use curl 127.0.0.1, it will also get stuck Phenomenon, please help me to find out what the problem is.
All requests for non-ingress-nginx-controller services are running normally, including the health check port 10254 of the ingress-nginx-controller service.
Environmental information:
kubernetes version: 1.27.4
OS: CentOS : CentOS Linux release 7.9.2009 (Core)
Linux kernel: Linux dong-k8s-90 4.20.13-1.el7.elrepo.x86_64 #1 SMP Wed Feb 27 10:02:05 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
runtime: containerd://1.7.2
Install tools:
[root@dong-k8s-90 ingress-nginx-controller]# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:20:54Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:14:49Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
[root@dong-k8s-90 ingress-nginx-controller]# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
dong-k8s-90 Ready control-plane 15d v1.27.4 10.206.60.90 <none> CentOS Linux 7 (Core) 4.20.13-1.el7.elrepo.x86_64 containerd://1.7.2
dong-k8s-91 Ready control-plane 15d v1.27.4 10.206.60.91 <none> CentOS Linux 7 (Core) 4.20.13-1.el7.elrepo.x86_64 containerd://1.7.2
dong-k8s-92 Ready control-plane 15d v1.27.4 10.206.60.92 <none> CentOS Linux 7 (Core) 4.20.13-1.el7.elrepo.x86_64 containerd://1.7.2
dong-k8s-93 Ready <none> 15d v1.27.4 10.206.60.93 <none> CentOS Linux 7 (Core) 4.20.13-1.el7.elrepo.x86_64 containerd://1.7.2
dong-k8s-95 Ready <none> 15d v1.27.4 10.206.60.95 <none> CentOS Linux 7 (Core) 4.20.13-1.el7.elrepo.x86_64 containerd://1.7.2
CNI: calico-3.26.1 using IPIP mode, Deployment manifest used https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml
How was the ingress-nginx-controller installed:
ingress-nginx-controller version: v1.8.1 Deployment manifest used https://github.com/kubernetes/ingress-nginx/blob/main/deploy/static/provider/baremetal/deploy.yaml
Current State of the controller:
[root@dong-k8s-90 ingress-nginx-controller]# kubectl describe ingressclasses
Name: nginx
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress-nginx
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.8.1
Annotations: <none>
Controller: k8s.io/ingress-nginx
Events: <none>
[root@dong-k8s-90 ingress-nginx-controller]# kubectl -n ingress-nginx describe po ingress-nginx-controller-7898b9666d-7zwg6
Name: ingress-nginx-controller-7898b9666d-7zwg6
Namespace: ingress-nginx
Priority: 0
Service Account: ingress-nginx
Node: dong-k8s-95/10.206.60.95
Start Time: Sun, 06 Aug 2023 13:19:51 +0800
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress-nginx
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.8.1
pod-template-hash=7898b9666d
Annotations: cni.projectcalico.org/containerID: 298f9ee44d0a3ff61f7fad9ef8cdd1983a52c1b3b70780a5f7d27a1a6ecd7af4
cni.projectcalico.org/podIP: 10.244.158.227/32
cni.projectcalico.org/podIPs: 10.244.158.227/32
Status: Running
IP: 10.244.158.227
IPs:
IP: 10.244.158.227
Controlled By: ReplicaSet/ingress-nginx-controller-7898b9666d
Containers:
controller:
Container ID: containerd://09e4e4a164020e089e5fbd144b8d20493a545894b36f980c6c4b9311eb3c04fb
Image: docker.sre.com/ingress-nginx/controller:v1.8.1
Image ID: docker.sre.com/ingress-nginx/controller@sha256:bd54c330f73b17d0bf19f3ec3832b285d43a4c9fa5fe15f5a7accd3de706b438
Ports: 80/TCP, 443/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--election-id=ingress-nginx-leader
--controller-class=k8s.io/ingress-nginx
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/ingress-nginx-controller
--validating-webhook=:8443
--validating-webhook-certificate=/usr/local/certificates/cert
--validating-webhook-key=/usr/local/certificates/key
--v=4
State: Running
Started: Sun, 06 Aug 2023 13:19:54 +0800
Ready: True
Restart Count: 0
Requests:
cpu: 100m
memory: 90Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: ingress-nginx-controller-7898b9666d-7zwg6 (v1:metadata.name)
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
LD_PRELOAD: /usr/local/lib/libmimalloc.so
Mounts:
/usr/local/certificates/ from webhook-cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqwfp (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
webhook-cert:
Type: Secret (a volume populated by a Secret)
SecretName: ingress-nginx-admission
Optional: false
kube-api-access-fqwfp:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m5s default-scheduler Successfully assigned ingress-nginx/ingress-nginx-controller-7898b9666d-7zwg6 to dong-k8s-95
Warning FailedMount 3m54s (x2 over 3m55s) kubelet MountVolume.SetUp failed for volume "webhook-cert" : secret "ingress-nginx-admission" not found
Normal Pulled 3m52s kubelet Container image "docker.sre.com/ingress-nginx/controller:v1.8.1" already present on machine
Normal Created 3m52s kubelet Created container controller
Normal Started 3m52s kubelet Started container controller
Normal RELOAD 3m51s nginx-ingress-controller NGINX reload triggered due to a change in configuration
[root@dong-k8s-90 ingress-nginx-controller]# kubectl -n ingress-nginx describe svc ingress-nginx-controller
Name: ingress-nginx-controller
Namespace: ingress-nginx
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress-nginx
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.8.1
Annotations: <none>
Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.97.230.39
IPs: 10.97.230.39
Port: http 80/TCP
TargetPort: http/TCP
NodePort: http 30882/TCP
Endpoints: 10.244.158.227:80
Port: https 443/TCP
TargetPort: https/TCP
NodePort: https 31057/TCP
Endpoints: 10.244.158.227:443
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
The following is the packet capture information when something goes wrong:
The client initiates a curl request
[root@dong-k8s-90 ingress-nginx-controller]# curl 10.244.32.32 -v
* About to connect() to 10.244.32.32 port 80 (#0)
*Trying 10.244.32.32...
* Connected to 10.244.32.32 (10.244.32.32) port 80 (#0)
> GET /HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.244.32.32
> Accept: */*
>
It has been stuck in this state and has not returned.
ps: Because the pod has been restarted, the IP address seen has changed and the information captured is different.
The request packet captured by the client
[root@dong-k8s-90 ingress-nginx-controller]# tcpdump -nn -n -i tunl0 host 10.244.32.32 and port 80 -e -v
tcpdump: listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes
17:30:45.367189 ip: (tos 0x0, ttl 64, id 2003, offset 0, flags [DF], proto TCP (6), length 60)
10.244.137.192.19066 > 10.244.32.32.80: Flags [S], cksum 0xbff6 (incorrect -> 0x7284), seq 1217195127, win 64800, options [mss 1440, sackOK, TS val 2772 693908 ecr 0,nop,wscale 7] , length 0
17:30:45.367699 ip: (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.244.32.32.80 > 10.244.137.192.19066: Flags [S.], cksum 0xa821 (correct), seq 402895697, ack 1217195128, win 64260, options [mss 1440,sackOK,TS val 78445676 ecr 2772693908,nop,wscale 7 ], length 0
17:30:45.367810 ip: (tos 0x0, ttl 64, id 2004, offset 0, flags [DF], proto TCP (6), length 52)
10.244.137.192.19066 > 10.244.32.32.80: Flags [.], cksum 0xbfee (incorrect -> 0xcfe2), ack 1, win 507, options [nop,nop,TS val 2772693909 ecr 78445676] , length 0
17:30:45.367949 ip: (tos 0x0, ttl 64, id 2005, offset 0, flags [DF], proto TCP (6), length 128)
10.244.137.192.19066 > 10.244.32.32.80: Flags [P.], cksum 0xc03a (incorrect -> 0x806e), seq 1:77, ack 1, win 507, options [nop,nop,TS val 2772693909 ecr 78445676] , length 76: HTTP, length: 76
GET / HTTP/1.1
User-Agent: curl/7.29.0
Host: 10.244.32.32
Accept: */*
17:30:45.368698 ip: (tos 0x0, ttl 63, id 33244, offset 0, flags [DF], proto TCP (6), length 52)
10.244.32.32.80 > 10.244.137.192.19066: Flags [.], cksum 0xcf9a (correct), ack 77, win 502, options [nop,nop,TS val 78445677 ecr 2772693909], length 0
17:30:55.449188 ip: (tos 0x0, ttl 64, id 2006, offset 0, flags [DF], proto TCP (6), length 52)
10.244.137.192.19066 > 10.244.32.32.80: Flags [F.], cksum 0xbfee (incorrect -> 0xa833), seq 77, ack 1, win 507, options [nop,nop,TS val 2772703990 ecr 78445677], length 0
17:30:55.490585 ip: (tos 0x0, ttl 63, id 33245, offset 0, flags [DF], proto TCP (6), length 52)
10.244.32.32.80 > 10.244.137.192.19066: Flags [.], cksum 0x80ae (correct), ack 78, win 502, options [nop,nop,TS val 78455799 ecr 2772703990], length 0
ingress-nginx-controller container network capture
[root@dong-k8s-93 ~]# ps -ef|grep nginx
101 15699 15227 0 16:51 ? 00:00:00 /usr/bin/dumb-init -- /nginx-ingress-controller --election-id=ingress-nginx-leader --controller-class=k8s.io/ ingress-nginx --ingress-class=nginx --configmap=ingress-nginx/ingress-nginx-controller --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating -webhook-key=/usr/local/certificates/key
101 15833 15699 0 16:51 ? 00:00:03 /nginx-ingress-controller --election-id=ingress-nginx-leader --controller-class=k8s.io/ingress-nginx --ingress-class=nginx --configmap=ingress-nginx/ingress-nginx-controller --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating-webhook-key=/usr/local/ certificates/key
101 16546 15833 0 16:51 ? 00:00:00 nginx: master process /usr/bin/nginx -c /etc/nginx/nginx.conf
[root@dong-k8s-93 ~]# nsenter -n -t 15699
[root@dong-k8s-93 ~]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1480
inet 10.244.32.32 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::4c68:83ff:fe5d:687e prefixlen 64 scopeid 0x20<link>
ether 4e:68:83:5d:68:7e txqueuelen 1000 (Ethernet)
RX packets 12056 bytes 3037628 (2.8 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 10075 bytes 1263907 (1.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 15365 bytes 1243138 (1.1 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 15365 bytes 1243138 (1.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@dong-k8s-93 ~]# tcpdump -nn -n port 80 -e -v
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:30:52.367684 ee:ee:ee:ee:ee:ee > 4e:68:83:5d:68:7e, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 63, id 2003, offset 0, flags [DF], proto TCP (6), length 60)
10.244.137.192.19066 > 10.244.32.32.80: Flags [S], cksum 0x7284 (correct), seq 1217195127, win 64800, options [mss 1440,sackOK,TS val 2772693908 ecr 0,nop,wscale 7], length 0
17:30:52.367761 4e:68:83:5d:68:7e > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.244.32.32.80 > 10.244.137.192.19066: Flags [S.], cksum 0xbff6 (incorrect -> 0xa821), seq 402895697, ack 1217195128, win 64260, options [mss 1440,sackOK,TS val 78445676 ecr 2772693908,nop,wscale 7], length 0
17:30:52.368114 ee:ee:ee:ee:ee:ee > 4e:68:83:5d:68:7e, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 63, id 2004, offset 0, flags [DF], proto TCP (6), length 52)
10.244.137.192.19066 > 10.244.32.32.80: Flags [.], cksum 0xcfe2 (correct), ack 1, win 507, options [nop,nop,TS val 2772693909 ecr 78445676], length 0
17:30:52.368615 ee:ee:ee:ee:ee:ee > 4e:68:83:5d:68:7e, ethertype IPv4 (0x0800), length 142: (tos 0x0, ttl 63, id 2005, offset 0, flags [DF], proto TCP (6), length 128)
10.244.137.192.19066 > 10.244.32.32.80: Flags [P.], cksum 0x806e (correct), seq 1:77, ack 1, win 507, options [nop,nop,TS val 2772693909 ecr 78445676], length 76: HTTP, length: 76
GET / HTTP/1.1
User-Agent: curl/7.29.0
Host: 10.244.32.32
Accept: */*
17:30:52.368641 4e:68:83:5d:68:7e > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 64, id 33244, offset 0, flags [DF], proto TCP (6), length 52)
10.244.32.32.80 > 10.244.137.192.19066: Flags [.], cksum 0xbfee (incorrect -> 0xcf9a), ack 77, win 502, options [nop,nop,TS val 78445677 ecr 2772693909], length 0
17:31:02.449630 ee:ee:ee:ee:ee:ee > 4e:68:83:5d:68:7e, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 63, id 2006, offset 0, flags [DF], proto TCP (6), length 52)
10.244.137.192.19066 > 10.244.32.32.80: Flags [F.], cksum 0xa833 (correct), seq 77, ack 1, win 507, options [nop,nop,TS val 2772703990 ecr 78445677], length 0
17:31:02.490541 4e:68:83:5d:68:7e > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 64, id 33245, offset 0, flags [DF], proto TCP (6), length 52)
10.244.32.32.80 > 10.244.137.192.19066: Flags [.], cksum 0xbfee (incorrect -> 0x80ae), ack 78, win 502, options [nop,nop,TS val 78455799 ecr 2772703990], length 0
It will cause the client to be stuck all the time. This frequency is very high Please help me to find out what is causing the problem.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status