Skip to content

pod ends up in Completed state when it fails to mount its volume #15

Open
@atamon

Description

@atamon

We had a pod end up as "Completed" in production after it was rescheduled to to spikes in our resource usage (caused by careless overloading from other deployments).

The effect of this is that kubernetes won't remove the pod and create a new one, as it would if it would have ended up as "Error", which I expect it would.

The below output from describe pod explains what we believe was the cause/reason. It failed to mount its volume (which was being moved between gcloud instances).

> kubectl -n kafka describe pod kafka-1
Name:		kafka-1
Namespace:	kafka
Node:		gke-eu-west-2-default-pool-29350486-ndvt/10.132.0.7
Start Time:	Mon, 26 Dec 2016 02:00:59 +0100
Labels:		app=kafka
Status:		Running
IP:
Controllers:	StatefulSet/kafka
Containers:
  broker:
    Container ID:	docker://649e62ca52cb4f2f0fd8b26dbb83777e6a3f99bc63247b27131d47a536103e6f
    Image:		solsson/kafka-persistent:0.10.1@sha256:110f9e866acd4fb9e059b45884c34a210b2f40d6e2f8afe98ded616f43b599f9
    Image ID:		docker-pullable://solsson/kafka-persistent@sha256:110f9e866acd4fb9e059b45884c34a210b2f40d6e2f8afe98ded616f43b599f9
    Port:		9092/TCP
    Command:
      sh
      -c
      ./bin/kafka-server-start.sh config/server.properties --override broker.id=$(hostname | awk -F'-' '{print $2}')
    State:		Terminated
      Reason:		Completed
      Exit Code:	0
      Started:		Thu, 29 Dec 2016 16:44:10 +0100
      Finished:		Thu, 05 Jan 2017 09:25:33 +0100
    Ready:		False
    Restart Count:	1
    Volume Mounts:
      /opt/kafka/data from datadir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-x85zh (ro)
    Environment Variables:	<none>
Conditions:
  Type		Status
  Initialized 	True
  Ready 	False
  PodScheduled 	True
Volumes:
  datadir:
    Type:	PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:	datadir-kafka-1
    ReadOnly:	false
  default-token-x85zh:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-x85zh
QoS Class:	BestEffort
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From						SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----						-------------	--------	------		-------
  4h		28s		119	{kubelet gke-eu-west-2-default-pool-29350486-ndvt}			Warning		FailedMount	Unable to mount volumes for pod "kafka-1_kafka(c37db470-cb06-11e6-882c-42010a84014e)": timeout expired waiting for volumes to attach/mount for pod "kafka-1"/"kafka". list of unattached/unmounted volumes=[datadir]
  4h		28s		119	{kubelet gke-eu-west-2-default-pool-29350486-ndvt}			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "kafka-1"/"kafka". list of unattached/unmounted volumes=[datadir]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions