Skip to content

Pods using EBS-backed PVC sometimes get stuck. #38301

Closed
@exarkun

Description

@exarkun

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.):

No.

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):

unmount
umount
Error checking if mountpoint

Is this a BUG REPORT or FEATURE REQUEST? (choose one):

BUG REPORT.

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.6", GitCommit:"e569a27d02001e343cb68086bc06d47804f62af6", GitTreeState:"clean", BuildDate:"2016-11-12T05:22:15Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.6", GitCommit:"e569a27d02001e343cb68086bc06d47804f62af6", GitTreeState:"clean", BuildDate:"2016-11-12T05:16:27Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): Debian GNU/Linux 8 (jessie)
  • Kernel (e.g. uname -a):Linux ip-172-20-84-61 4.4.26-k8s Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Fri Oct 21 05:21:13 UTC 2016 x86_64 GNU/Linux
  • Install tools: kops Version git-e1a9aad
  • Others:

What happened:

I created a storageclass and a new PVC referencing it:

apiVersion: v1
items:
- apiVersion: storage.k8s.io/v1beta1
  kind: StorageClass
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: '{"kind":"StorageClass","apiVersion":"storage.k8s.io/v1beta1","metadata":{"name":"normal","creationTimestamp":null},"provisioner":"kubernetes.io/aws-ebs","parameters":{"type":"gp2"}}'
    creationTimestamp: 2016-12-07T15:31:12Z
    name: normal
    resourceVersion: "2739806"
    selfLink: /apis/storage.k8s.io/v1beta1/storageclasses/normal
    uid: 2f28bfcc-bc92-11e6-b3c8-12e507f54388
  parameters:
    type: gp2
  provisioner: kubernetes.io/aws-ebs
kind: List
metadata: {}
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: '{"kind":"PersistentVolumeClaim","apiVersion":"v1","metadata":{"name":"infrastructure-foolscap-logs-pvc","creationTimestamp":null,"labels":{"app":"s4","component":"Infrastructure","provider":"LeastAuthority"},"annotations":{"volume.beta.kubernetes.io/storage-class":"normal"}},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"10G"}}},"status":{}}'
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-class: normal
  creationTimestamp: 2016-12-07T15:10:30Z
  labels:
    app: s4
    component: Infrastructure
    provider: LeastAuthority
  name: infrastructure-foolscap-logs-pvc
  namespace: staging
  resourceVersion: "2739819"
  selfLink: /api/v1/namespaces/staging/persistentvolumeclaims/infrastructure-foolscap-logs-pvc
  uid: 4b3c2fb4-bc8f-11e6-b3c8-12e507f54388
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10G
  volumeName: pvc-4b3c2fb4-bc8f-11e6-b3c8-12e507f54388
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  phase: Bound

And I updated my deployment to include a volume using this PVC and updated the deployment's template spec so that one of the containers would mount this volume. Then I deployed this with kubectl apply -f .... I make some tweaks and repeated this operation a few times. Behavior was as expected (EBS-backed PV created, pod started, container had PV mounted in it, data persisted across deployment updates).

On the last deployment update (in which I changed the image used by some of the containers), the new pod failed to come up. The web ui reported

Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "s4-infrastructure-3171603516-8zj8k"/"staging". list of unattached/unmounted volumes=[log-gatherer-data]

What you expected to happen:

I expected a new pod to be created and its containers to start, and for the container using the log-gatherer-data volume to have the data it had before the deployment update.

How to reproduce it (as minimally and precisely as possible):

Anything else do we need to know:

There are many mount/unmount errors in the kubectl journalctl log, attached.

logs.txt

The EBS volume backing the PVC is indeed attached to the node.
The mount state is:

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=479670,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,relatime,size=771468k,mode=755)
/dev/xvda1 on / type ext4 (rw,relatime,data=ordered)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=23,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime)
rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw,relatime)
/dev/xvdc on /mnt type ext3 (rw,relatime,data=ordered)
tmpfs on /var/lib/kubelet/pods/7cce5087-ab69-11e6-b3c8-12e507f54388/volumes/kubernetes.io~secret/default-token-3mbvh type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/7d411a93-ab69-11e6-b3c8-12e507f54388/volumes/kubernetes.io~secret/default-token-3mbvh type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/ae9bb673-ac0b-11e6-b3c8-12e507f54388/volumes/kubernetes.io~secret/default-token-fp0o5 type tmpfs (rw,relatime)
/dev/xvdba on /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/vol-0f1ca7d3ab1426833 type ext4 (rw,relatime,data=ordered)
/dev/xvdba on /var/lib/kubelet/pods/ae9bb673-ac0b-11e6-b3c8-12e507f54388/volumes/kubernetes.io~aws-ebs/leastauthority-tweaks-kube-registry-pv type ext4 (rw,relatime,data=ordered)
/dev/xvdbc on /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/vol-0e80ac26be3edd63f type ext4 (rw,relatime,data=ordered)
/dev/xvdbb on /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/vol-01b01d11a6b17e2de type ext4 (rw,relatime,data=ordered)
tmpfs on /var/lib/kubelet/pods/755e6718-ac11-11e6-b3c8-12e507f54388/volumes/kubernetes.io~secret/default-token-zwvk5 type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/d81c5474-bbfb-11e6-b3c8-12e507f54388/volumes/kubernetes.io~secret/web-secrets type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/d81c5474-bbfb-11e6-b3c8-12e507f54388/volumes/kubernetes.io~secret/default-token-zwvk5 type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/d81c5474-bbfb-11e6-b3c8-12e507f54388/volumes/kubernetes.io~secret/flapp-secrets type tmpfs (rw,relatime)
/dev/xvdbc on /var/lib/kubelet/pods/d81c5474-bbfb-11e6-b3c8-12e507f54388/volumes/kubernetes.io~aws-ebs/infrastructure-web-pv type ext4 (rw,relatime,data=ordered)
/dev/xvdbb on /var/lib/kubelet/pods/d81c5474-bbfb-11e6-b3c8-12e507f54388/volumes/kubernetes.io~aws-ebs/infrastructure-flapp-pv type ext4 (rw,relatime,data=ordered)
/dev/xvdbd on /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1b/vol-04e25da2c73877960 type ext4 (rw,relatime,data=ordered)
tmpfs on /var/lib/kubelet/pods/4f92171a-bc98-11e6-b3c8-12e507f54388/volumes/kubernetes.io~secret/default-token-36roi type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/f654cc46-bc9a-11e6-b3c8-12e507f54388/volumes/kubernetes.io~secret/flapp-secrets type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/f654cc46-bc9a-11e6-b3c8-12e507f54388/volumes/kubernetes.io~secret/web-secrets type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/f654cc46-bc9a-11e6-b3c8-12e507f54388/volumes/kubernetes.io~secret/default-token-36roi type tmpfs (rw,relatime)

The directory referenced by the stat error in the logs is empty:

admin@ip-172-20-84-61:~$ sudo ls -al /var/lib/kubelet/pods/938a5bbf-bc95-11e6-b3c8-12e507f54388/volumes/kubernetes.io~aws-ebs/
total 8
drwxr-x--- 2 root root 4096 Dec  7 15:57 .
drwxr-x--- 5 root root 4096 Dec  7 15:55 ..
admin@ip-172-20-84-61:~$ 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions