Open
Description
We had a pod end up as "Completed" in production after it was rescheduled to to spikes in our resource usage (caused by careless overloading from other deployments).
The effect of this is that kubernetes won't remove the pod and create a new one, as it would if it would have ended up as "Error", which I expect it would.
The below output from describe pod explains what we believe was the cause/reason. It failed to mount its volume (which was being moved between gcloud instances).
> kubectl -n kafka describe pod kafka-1
Name: kafka-1
Namespace: kafka
Node: gke-eu-west-2-default-pool-29350486-ndvt/10.132.0.7
Start Time: Mon, 26 Dec 2016 02:00:59 +0100
Labels: app=kafka
Status: Running
IP:
Controllers: StatefulSet/kafka
Containers:
broker:
Container ID: docker://649e62ca52cb4f2f0fd8b26dbb83777e6a3f99bc63247b27131d47a536103e6f
Image: solsson/kafka-persistent:0.10.1@sha256:110f9e866acd4fb9e059b45884c34a210b2f40d6e2f8afe98ded616f43b599f9
Image ID: docker-pullable://solsson/kafka-persistent@sha256:110f9e866acd4fb9e059b45884c34a210b2f40d6e2f8afe98ded616f43b599f9
Port: 9092/TCP
Command:
sh
-c
./bin/kafka-server-start.sh config/server.properties --override broker.id=$(hostname | awk -F'-' '{print $2}')
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 29 Dec 2016 16:44:10 +0100
Finished: Thu, 05 Jan 2017 09:25:33 +0100
Ready: False
Restart Count: 1
Volume Mounts:
/opt/kafka/data from datadir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-x85zh (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: datadir-kafka-1
ReadOnly: false
default-token-x85zh:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-x85zh
QoS Class: BestEffort
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
4h 28s 119 {kubelet gke-eu-west-2-default-pool-29350486-ndvt} Warning FailedMount Unable to mount volumes for pod "kafka-1_kafka(c37db470-cb06-11e6-882c-42010a84014e)": timeout expired waiting for volumes to attach/mount for pod "kafka-1"/"kafka". list of unattached/unmounted volumes=[datadir]
4h 28s 119 {kubelet gke-eu-west-2-default-pool-29350486-ndvt} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "kafka-1"/"kafka". list of unattached/unmounted volumes=[datadir]
Metadata
Metadata
Assignees
Labels
No labels