-
Notifications
You must be signed in to change notification settings - Fork 194
Description
We have a fleet of pods in AKS running a CI/CD system (GitHub Actions runners) and each of them has a 32 GiB Azure Disk (Standard SSD) attached with a PVC, where we mount /var/lib/docker, so that the layers of GitHub actions based on docker images are cached (we use DinD and we start the docker daemon in a script when the pods start). The pods are running as a StatefulSet and the PVs attached using volumeClaimTemplates.
We are using sysbox CE version 0.4.1 to run DinD in our pods. These pods are created and destroyed based on demand and the disks are detached/reattached when pods are destroyed and recreated. Since we are using a StatefulSet, each pod gets the same unique disk every time.
When the pods are created for the first time and the disks are empty, no issues are observed. After a few detachments and reattachments, after the disks have some data in them, the docker daemon is unable to start; when it tries to chmod /var/lib/docker the error value too large for defined data type shows up:
Upon closer inspection of the file system in the pod, it turns out that sysbox is mounting shiftfs on /var/lib/docker, which, to our knowledge, should not be the case:
When looking into /var/lib/docker, the folders inside are owned by "nobody" and "nogroup":
Our volumeClaimTemplate is just:
volumeClaimTemplates:
- metadata:
name: docker-cache
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "default"
resources:
requests:
storage: 32Gi
which we refer to in our container spec:
volumeMounts:
- name: docker-cache
mountPath: /var/lib/docker


