Skip to content
This repository has been archived by the owner on Sep 7, 2022. It is now read-only.

Liveness probes for random pods fails on a node with vsphere volumes attach/deattach ops #466

Open
mqasimsarfraz opened this issue Mar 13, 2018 · 0 comments
Labels

Comments

@mqasimsarfraz
Copy link

mqasimsarfraz commented Mar 13, 2018

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:
I am running a stateful set with vsphere volumes. I am killing the pods randomly to test the volumes regression using:

while true; do sleep 30; kubectl delete pods -n website vsphere-volume-test-$(( ( RANDOM % 10 ) )); done

and it turns out that liveness probes for other pods on same machines where vsphere volume deletion is happening starts failing for a few seconds, details:

Mar 13 12:21:21 kube-node-04.example.com kubelet[28167]: I0313 12:21:21.939043   28209 kubelet.go:1836] SyncLoop (ADD, "api"): "vsphere-volume-test-3_website(0a598d73-26b9-11e8-889e-005056b254fe)"
Mar 13 12:21:22 kube-node-04.example.com kubelet[28167]: I0313 12:21:22.056782   28209 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-t4pbd" (UniqueName: "kubernetes.io/secret/0a598d73-26b9-11e8-889e-005056b254fe-default-token-t4pbd") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:21:22 kube-node-04.example.com kubelet[28167]: I0313 12:21:22.056823   28209 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-e5b485bb-2698-11e8-889e-005056b254fe" (UniqueName: "kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:21:22 kube-node-04.example.com kubelet[28167]: I0313 12:21:22.056847   28209 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "resolv-conf" (UniqueName: "kubernetes.io/configmap/0a598d73-26b9-11e8-889e-005056b254fe-resolv-conf") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:21:22 kube-node-04.example.com kubelet[28167]: E0313 12:21:22.056925   28209 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk\"" failed. No retries permitted until 2018-03-13 12:21:22.556884873 +0000 UTC m=+64019.496140699 (durationBeforeRetry 500ms). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-e5b485bb-2698-11e8-889e-005056b254fe\" (UniqueName: \"kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk\") pod \"vsphere-volume-test-3\" (UID: \"0a598d73-26b9-11e8-889e-005056b254fe\") "
Mar 13 12:21:22 kube-node-04.example.com kubelet[28167]: I0313 12:21:22.157168   28209 reconciler.go:262] operationExecutor.MountVolume started for volume "resolv-conf" (UniqueName: "kubernetes.io/configmap/0a598d73-26b9-11e8-889e-005056b254fe-resolv-conf") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:21:22 kube-node-04.example.com kubelet[28167]: I0313 12:21:22.157220   28209 reconciler.go:262] operationExecutor.MountVolume started for volume "default-token-t4pbd" (UniqueName: "kubernetes.io/secret/0a598d73-26b9-11e8-889e-005056b254fe-default-token-t4pbd") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:21:22 kube-node-04.example.com kubelet[28167]: I0313 12:21:22.217688   28209 operation_generator.go:522] MountVolume.SetUp succeeded for volume "default-token-t4pbd" (UniqueName: "kubernetes.io/secret/0a598d73-26b9-11e8-889e-005056b254fe-default-token-t4pbd") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:21:22 kube-node-04.example.com kubelet[28167]: I0313 12:21:22.217756   28209 operation_generator.go:522] MountVolume.SetUp succeeded for volume "resolv-conf" (UniqueName: "kubernetes.io/configmap/0a598d73-26b9-11e8-889e-005056b254fe-resolv-conf") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:22:23 kube-node-04.example.com kubelet[28167]: I0313 12:22:23.762404   28209 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-e5b485bb-2698-11e8-889e-005056b254fe" (UniqueName: "kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:22:23 kube-node-04.example.com kubelet[28167]: E0313 12:22:23.762528   28209 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk\"" failed. No retries permitted until 2018-03-13 12:22:24.76247914 +0000 UTC m=+64081.701734997 (durationBeforeRetry 1s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-e5b485bb-2698-11e8-889e-005056b254fe\" (UniqueName: \"kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk\") pod \"vsphere-volume-test-3\" (UID: \"0a598d73-26b9-11e8-889e-005056b254fe\") "
Mar 13 12:22:23 kube-node-04.example.com kubelet[28167]: I0313 12:22:23.766675   28209 kubelet.go:1778] skipping pod synchronization - [container runtime is down]
Mar 13 12:22:23 kube-node-04.example.com kubelet[28167]: I0313 12:22:23.888171   28209 prober.go:111] Liveness probe for "tomcat-0_website(19a2f785-2624-11e8-889e-005056b254fe):tomcat" failed (failure): Get http://10.234.78.30:8080/ping: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Mar 13 12:22:23 kube-node-04.example.com kubelet[28167]: I0313 12:22:23.984230   28209 server.go:796] GET /pods/?verbose=True: (1.10746ms) 200 [[python-requests/2.11.1] 10.1.165.77:46992]
Mar 13 12:22:24 kube-node-04.example.com kubelet[28167]: I0313 12:22:24.359311   28209 kubelet_node_status.go:443] Using node IP: "10.1.165.77"
Mar 13 12:22:24 kube-node-04.example.com kubelet[28167]: I0313 12:22:24.361139   28209 kubelet_node_status.go:431] Recording NodeHasSufficientMemory event message for node kube-node-04.example.com
Mar 13 12:22:24 kube-node-04.example.com kubelet[28167]: I0313 12:22:24.361158   28209 kubelet_node_status.go:431] Recording NodeHasNoDiskPressure event message for node kube-node-04.example.com
Mar 13 12:22:24 kube-node-04.example.com kubelet[28167]: I0313 12:22:24.361168   28209 kubelet_node_status.go:431] Recording NodeReady event message for node kube-node-04.example.com
Mar 13 12:22:24 kube-node-04.example.com kubelet[28167]: I0313 12:22:24.810129   28209 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-e5b485bb-2698-11e8-889e-005056b254fe" (UniqueName: "kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:22:25 kube-node-04.example.com kubelet[28167]: E0313 12:22:25.368773   28209 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk\"" failed. No retries permitted until 2018-03-13 12:22:27.368739522 +0000 UTC m=+64084.307995359 (durationBeforeRetry 2s). Error: "Volume not attached according to node status for volume \"pvc-e5b485bb-2698-11e8-889e-005056b254fe\" (UniqueName: \"kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk\") pod \"vsphere-volume-test-3\" (UID: \"0a598d73-26b9-11e8-889e-005056b254fe\") "
Mar 13 12:22:27 kube-node-04.example.com kubelet[28167]: I0313 12:22:27.417988   28209 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-e5b485bb-2698-11e8-889e-005056b254fe" (UniqueName: "kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk") pod "vsphere-volume-test-3" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:22:27 kube-node-04.example.com kubelet[28167]: E0313 12:22:27.421373   28209 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk\"" failed. No retries permitted until 2018-03-13 12:22:31.421337984 +0000 UTC m=+64088.360593823 (durationBeforeRetry 4s). Error: "Volume not attached according to node status for volume \"pvc-e5b485bb-2698-11e8-889e-005056b254fe\" (UniqueName: \"kubernetes.io/vsphere-volume/[My-Store-NFS01] kubevols/kubernetes-dynamic-pvc-e5b485bb-2698-11e8-889e-005056b254fe.vmdk\") pod \"vsphere-volume-test-3\" (UID: \"0a598d73-26b9-11e8-889e-005056b254fe\") "
Mar 13 12:22:28 kube-node-04.example.com kubelet[28167]: I0313 12:22:28.781505   28209 kubelet.go:1852] SyncLoop (DELETE, "api"): "vsphere-volume-test-3_website(0a598d73-26b9-11e8-889e-005056b254fe)"
Mar 13 12:22:28 kube-node-04.example.com kubelet[28167]: I0313 12:22:28.921438   28209 reconciler.go:191] operationExecutor.UnmountVolume started for volume "default-token-t4pbd" (UniqueName: "kubernetes.io/secret/0a598d73-26b9-11e8-889e-005056b254fe-default-token-t4pbd") pod "0a598d73-26b9-11e8-889e-005056b254fe" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:22:28 kube-node-04.example.com kubelet[28167]: I0313 12:22:28.921476   28209 reconciler.go:191] operationExecutor.UnmountVolume started for volume "resolv-conf" (UniqueName: "kubernetes.io/configmap/0a598d73-26b9-11e8-889e-005056b254fe-resolv-conf") pod "0a598d73-26b9-11e8-889e-005056b254fe" (UID: "0a598d73-26b9-11e8-889e-005056b254fe")
Mar 13 12:22:28 kube-node-04.example.com kubelet[28167]: I0313 12:22:28.921781   28209 operation_generator.go:633] UnmountVolume.TearDown succeeded for volume "kubernetes.io/configmap/0a598d73-26b9-11e8-889e-005056b254fe-resolv-conf" (OuterVolumeSpecName: "resolv-conf") pod "0a598d73-26b9-11e8-889e-005056b254fe" (UID: "0a598d73-26b9-11e8-889e-005056b254fe"). InnerVolumeSpecName "resolv-conf". PluginName "kubernetes.io/configmap", VolumeGidValue ""
Mar 13 12:22:28 kube-node-04.example.com kubelet[28167]: I0313 12:22:28.933188   28209 operation_generator.go:633] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/0a598d73-26b9-11e8-889e-005056b254fe-default-token-t4pbd" (OuterVolumeSpecName: "default-token-t4pbd") pod "0a598d73-26b9-11e8-889e-005056b254fe" (UID: "0a598d73-26b9-11e8-889e-005056b254fe"). InnerVolumeSpecName "default-token-t4pbd". PluginName "kubernetes.io/secret", VolumeGidValue ""
Mar 13 12:22:29 kube-node-04.example.com kubelet[28167]: I0313 12:22:29.021747   28209 reconciler.go:297] Volume detached for volume "default-token-t4pbd" (UniqueName: "kubernetes.io/secret/0a598d73-26b9-11e8-889e-005056b254fe-default-token-t4pbd") on node "kube-node-04.example.com" DevicePath ""
Mar 13 12:22:29 kube-node-04.example.com kubelet[28167]: I0313 12:22:29.021776   28209 reconciler.go:297] Volume detached for volume "resolv-conf" (UniqueName: "kubernetes.io/configmap/0a598d73-26b9-11e8-889e-005056b254fe-resolv-conf") on node "kube-node-04.example.com" DevicePath ""

What you expected to happen:
No pods should be effected expect the one being killed on a node.

Environment:

  • Kubernetes version (use kubectl version):
    1.9.2
  • Cloud provider or hardware configuration:
    VSphere
  • OS (e.g. from /etc/os-release):
    CenOS
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants