This repository was archived by the owner on Sep 7, 2022. It is now read-only.
This repository was archived by the owner on Sep 7, 2022. It is now read-only.
Liveness probes for random pods fails on a node with vsphere volumes attach/deattach ops #466
Open
Description
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
I am running a stateful set with vsphere volumes. I am killing the pods randomly to test the volumes regression using:
while true; do sleep 30; kubectl delete pods -n website vsphere-volume-test-$(( ( RANDOM % 10 ) )); done
and it turns out that liveness probes for other pods on same machines where vsphere volume deletion is happening starts failing for a few seconds, details:
Mar 13 12:21:21 kube-node-04.example.com kubelet[28167]: I0313 12:21:21.939043 28209 kubelet.go:1836] SyncLoop (ADD, "api"): "vsphere-volume-test-3_website(0a598d73-26b9-11e8-889e-005056b254fe)"
Mar 13 12:21:22 kube-node-04.example.com kubelet[28167]: I0313 12:21:22.056782 28209 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-t4pbd" (UniqueName: "kubernetes.io/secret/0a598d73-26b9-11e8-889e-005056b254fe-default-token-t4pbd") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:21:22 kube-node-04.example.com kubelet[28167]: I0313 12:21:22.056823 28209 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-e5b485bb-2698-11e8-889e-005056b254fe" (UniqueName: "kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:21:22 kube-node-04.example.com kubelet[28167]: I0313 12:21:22.056847 28209 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "resolv-conf" (UniqueName: "kubernetes.io/configmap/0a598d73-26b9-11e8-889e-005056b254fe-resolv-conf") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:21:22 kube-node-04.example.com kubelet[28167]: E0313 12:21:22.056925 28209 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk\"" failed. No retries permitted until 2018-03-13 12:21:22.556884873 +0000 UTC m=+64019.496140699 (durationBeforeRetry 500ms). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-e5b485bb-2698-11e8-889e-005056b254fe\" (UniqueName: \"kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk\") pod \"vsphere-volume-test-3\" (UID: \"0a598d73-26b9-11e8-889e-005056b254fe\") "
Mar 13 12:21:22 kube-node-04.example.com kubelet[28167]: I0313 12:21:22.157168 28209 reconciler.go:262] operationExecutor.MountVolume started for volume "resolv-conf" (UniqueName: "kubernetes.io/configmap/0a598d73-26b9-11e8-889e-005056b254fe-resolv-conf") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:21:22 kube-node-04.example.com kubelet[28167]: I0313 12:21:22.157220 28209 reconciler.go:262] operationExecutor.MountVolume started for volume "default-token-t4pbd" (UniqueName: "kubernetes.io/secret/0a598d73-26b9-11e8-889e-005056b254fe-default-token-t4pbd") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:21:22 kube-node-04.example.com kubelet[28167]: I0313 12:21:22.217688 28209 operation_generator.go:522] MountVolume.SetUp succeeded for volume "default-token-t4pbd" (UniqueName: "kubernetes.io/secret/0a598d73-26b9-11e8-889e-005056b254fe-default-token-t4pbd") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:21:22 kube-node-04.example.com kubelet[28167]: I0313 12:21:22.217756 28209 operation_generator.go:522] MountVolume.SetUp succeeded for volume "resolv-conf" (UniqueName: "kubernetes.io/configmap/0a598d73-26b9-11e8-889e-005056b254fe-resolv-conf") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:22:23 kube-node-04.example.com kubelet[28167]: I0313 12:22:23.762404 28209 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-e5b485bb-2698-11e8-889e-005056b254fe" (UniqueName: "kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:22:23 kube-node-04.example.com kubelet[28167]: E0313 12:22:23.762528 28209 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk\"" failed. No retries permitted until 2018-03-13 12:22:24.76247914 +0000 UTC m=+64081.701734997 (durationBeforeRetry 1s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-e5b485bb-2698-11e8-889e-005056b254fe\" (UniqueName: \"kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk\") pod \"vsphere-volume-test-3\" (UID: \"0a598d73-26b9-11e8-889e-005056b254fe\") "
Mar 13 12:22:23 kube-node-04.example.com kubelet[28167]: I0313 12:22:23.766675 28209 kubelet.go:1778] skipping pod synchronization - [container runtime is down]
Mar 13 12:22:23 kube-node-04.example.com kubelet[28167]: I0313 12:22:23.888171 28209 prober.go:111] Liveness probe for "tomcat-0_website(19a2f785-2624-11e8-889e-005056b254fe):tomcat" failed (failure): Get http://10.234.78.30:8080/ping: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Mar 13 12:22:23 kube-node-04.example.com kubelet[28167]: I0313 12:22:23.984230 28209 server.go:796] GET /pods/?verbose=True: (1.10746ms) 200 [[python-requests/2.11.1] 10.1.165.77:46992]
Mar 13 12:22:24 kube-node-04.example.com kubelet[28167]: I0313 12:22:24.359311 28209 kubelet_node_status.go:443] Using node IP: "10.1.165.77"
Mar 13 12:22:24 kube-node-04.example.com kubelet[28167]: I0313 12:22:24.361139 28209 kubelet_node_status.go:431] Recording NodeHasSufficientMemory event message for node kube-node-04.example.com
Mar 13 12:22:24 kube-node-04.example.com kubelet[28167]: I0313 12:22:24.361158 28209 kubelet_node_status.go:431] Recording NodeHasNoDiskPressure event message for node kube-node-04.example.com
Mar 13 12:22:24 kube-node-04.example.com kubelet[28167]: I0313 12:22:24.361168 28209 kubelet_node_status.go:431] Recording NodeReady event message for node kube-node-04.example.com
Mar 13 12:22:24 kube-node-04.example.com kubelet[28167]: I0313 12:22:24.810129 28209 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-e5b485bb-2698-11e8-889e-005056b254fe" (UniqueName: "kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:22:25 kube-node-04.example.com kubelet[28167]: E0313 12:22:25.368773 28209 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk\"" failed. No retries permitted until 2018-03-13 12:22:27.368739522 +0000 UTC m=+64084.307995359 (durationBeforeRetry 2s). Error: "Volume not attached according to node status for volume \"pvc-e5b485bb-2698-11e8-889e-005056b254fe\" (UniqueName: \"kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk\") pod \"vsphere-volume-test-3\" (UID: \"0a598d73-26b9-11e8-889e-005056b254fe\") "
Mar 13 12:22:27 kube-node-04.example.com kubelet[28167]: I0313 12:22:27.417988 28209 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-e5b485bb-2698-11e8-889e-005056b254fe" (UniqueName: "kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:22:27 kube-node-04.example.com kubelet[28167]: E0313 12:22:27.421373 28209 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk\"" failed. No retries permitted until 2018-03-13 12:22:31.421337984 +0000 UTC m=+64088.360593823 (durationBeforeRetry 4s). Error: "Volume not attached according to node status for volume \"pvc-e5b485bb-2698-11e8-889e-005056b254fe\" (UniqueName: \"kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk\") pod \"vsphere-volume-test-3\" (UID: \"0a598d73-26b9-11e8-889e-005056b254fe\") "
Mar 13 12:22:28 kube-node-04.example.com kubelet[28167]: I0313 12:22:28.781505 28209 kubelet.go:1852] SyncLoop (DELETE, "api"): "vsphere-volume-test-3_website(0a598d73-26b9-11e8-889e-005056b254fe)"
Mar 13 12:22:28 kube-node-04.example.com kubelet[28167]: I0313 12:22:28.921438 28209 reconciler.go:191] operationExecutor.UnmountVolume started for volume "default-token-t4pbd" (UniqueName: "kubernetes.io/secret/0a598d73-26b9-11e8-889e-005056b254fe-default-token-t4pbd") pod "0a598d73-26b9-11e8-889e-005056b254fe" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:22:28 kube-node-04.example.com kubelet[28167]: I0313 12:22:28.921476 28209 reconciler.go:191] operationExecutor.UnmountVolume started for volume "resolv-conf" (UniqueName: "kubernetes.io/configmap/0a598d73-26b9-11e8-889e-005056b254fe-resolv-conf") pod "0a598d73-26b9-11e8-889e-005056b254fe" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:22:28 kube-node-04.example.com kubelet[28167]: I0313 12:22:28.921781 28209 operation_generator.go:633] UnmountVolume.TearDown succeeded for volume "kubernetes.io/configmap/0a598d73-26b9-11e8-889e-005056b254fe-resolv-conf" (OuterVolumeSpecName: "resolv-conf") pod "0a598d73-26b9-11e8-889e-005056b254fe" (UID: "0a598d73-26b9-11e8-889e-005056b254fe"). InnerVolumeSpecName "resolv-conf". PluginName "kubernetes.io/configmap", VolumeGidValue ""
Mar 13 12:22:28 kube-node-04.example.com kubelet[28167]: I0313 12:22:28.933188 28209 operation_generator.go:633] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/0a598d73-26b9-11e8-889e-005056b254fe-default-token-t4pbd" (OuterVolumeSpecName: "default-token-t4pbd") pod "0a598d73-26b9-11e8-889e-005056b254fe" (UID: "0a598d73-26b9-11e8-889e-005056b254fe"). InnerVolumeSpecName "default-token-t4pbd". PluginName "kubernetes.io/secret", VolumeGidValue ""
Mar 13 12:22:29 kube-node-04.example.com kubelet[28167]: I0313 12:22:29.021747 28209 reconciler.go:297] Volume detached for volume "default-token-t4pbd" (UniqueName: "kubernetes.io/secret/0a598d73-26b9-11e8-889e-005056b254fe-default-token-t4pbd") on node "kube-node-04.example.com" DevicePath ""
Mar 13 12:22:29 kube-node-04.example.com kubelet[28167]: I0313 12:22:29.021776 28209 reconciler.go:297] Volume detached for volume "resolv-conf" (UniqueName: "kubernetes.io/configmap/0a598d73-26b9-11e8-889e-005056b254fe-resolv-conf") on node "kube-node-04.example.com" DevicePath ""
What you expected to happen:
No pods should be effected expect the one being killed on a node.
Environment:
- Kubernetes version (use
kubectl version
):
1.9.2 - Cloud provider or hardware configuration:
VSphere - OS (e.g. from /etc/os-release):
CenOS