device name change due to azure disk host cache setting #60344

andyzhangx · 2018-02-24T03:39:04Z

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

Uncomment only one, leave it on its own line:

/kind bug
/kind feature

What happened:
From v1.7, default host cache setting changed from None to ReadWrite, this would lead to device name change after attach multiple disks on azure vm, finally lead to disk unaccessiable from pod.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):
Following statefulset with 8 replicas will always fail due to dev name change.

apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
  name: myvm
spec:
  selector:
    matchLabels:
      app: myvm # has to match .spec.template.metadata.labels
  serviceName: "myvm"
  replicas: 8
  template:
    metadata:
      labels:
        app: myvm 
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: ubuntu
        image: ubuntu:xenial
        command: [ "/bin/bash", "-c", "--" ]
        args: [ "while true; do sleep 30; done;" ]
        livenessProbe:
          exec:
            command:
            - ls
            - /mnt/data
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          requests:
            cpu: 100m
            memory: 250Mi
        volumeMounts:
        - name: mydata
          mountPath: /mnt/data
  volumeClaimTemplates:
  - metadata:
      name: mydata
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: default
      resources:
        requests:
          storage: 1Gi

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): v1.7 - v1.10
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

/sig azure
/assign

The text was updated successfully, but these errors were encountered:

@karataliu

Automatic merge from submit-queue (batch tested with PRs 60346, 60135, 60289, 59643, 52640). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. fix device name change issue for azure disk **What this PR does / why we need it**: fix device name change issue for azure disk due to default host cache setting changed from None to ReadWrite from v1.7, and default host cache setting in azure portal is `None` **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes #60344, #57444 also fixes following issues: Azure/acs-engine#1918 Azure/AKS#201 **Special notes for your reviewer**: From v1.7, default host cache setting changed from None to ReadWrite, this would lead to device name change after attach multiple disks on azure vm, finally lead to disk unaccessiable from pod. For an example: statefulset with 8 replicas(each with an azure disk) on one node will always fail, according to my observation, add the 6th data disk will always make dev name change, some pod could not access data disk after that. I have verified this fix on v1.8.4 Without this PR on one node(dev name changes): ``` azureuser@k8s-agentpool2-40588258-0:~$ tree /dev/disk/azure ... â””â”€â”€ scsi1 â”œâ”€â”€ lun0 -> ../../../sdk â”œâ”€â”€ lun1 -> ../../../sdj â”œâ”€â”€ lun2 -> ../../../sde â”œâ”€â”€ lun3 -> ../../../sdf â”œâ”€â”€ lun4 -> ../../../sdg â”œâ”€â”€ lun5 -> ../../../sdh â””â”€â”€ lun6 -> ../../../sdi ``` With this PR on one node(no dev name change): ``` azureuser@k8s-agentpool2-40588258-1:~$ tree /dev/disk/azure ... â””â”€â”€ scsi1 â”œâ”€â”€ lun0 -> ../../../sdc â”œâ”€â”€ lun1 -> ../../../sdd â”œâ”€â”€ lun2 -> ../../../sde â”œâ”€â”€ lun3 -> ../../../sdf â”œâ”€â”€ lun5 -> ../../../sdh â””â”€â”€ lun6 -> ../../../sdi ``` Following `myvm-0`, `myvm-1` is crashing due to dev name change, after controller manager replacement, myvm2-x pods work well. ``` Every 2.0s: kubectl get po Sat Feb 24 04:16:26 2018 NAME READY STATUS RESTARTS AGE myvm-0 0/1 CrashLoopBackOff 13 41m myvm-1 0/1 CrashLoopBackOff 11 38m myvm-2 1/1 Running 0 35m myvm-3 1/1 Running 0 33m myvm-4 1/1 Running 0 31m myvm-5 1/1 Running 0 29m myvm-6 1/1 Running 0 26m myvm2-0 1/1 Running 0 17m myvm2-1 1/1 Running 0 14m myvm2-2 1/1 Running 0 12m myvm2-3 1/1 Running 0 10m myvm2-4 1/1 Running 0 8m myvm2-5 1/1 Running 0 5m myvm2-6 1/1 Running 0 3m ``` **Release note**: ``` fix device name change issue for azure disk ``` /assign @karataliu /sig azure @feiskyer could you mark it as v1.10 milestone? @brendandburns @khenidak @rootfs @jdumars FYI Since it's a critical bug, I will cherry pick this fix to v1.7-v1.9, note that v1.6 does not have this issue since default cachingmode is `None`

k8s-ci-robot assigned andyzhangx Feb 24, 2018

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/azure needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 24, 2018

andyzhangx mentioned this issue Feb 24, 2018

fix device name change issue for azure disk #60346

Merged

andyzhangx changed the title ~~fix device name change due to azure disk host cache setting~~ device name change due to azure disk host cache setting Feb 25, 2018

k8s-github-robot closed this as completed in #60346 Feb 25, 2018

andyzhangx mentioned this issue Apr 11, 2018

Input/output error when accessing PV Azure/AKS#297

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

device name change due to azure disk host cache setting #60344

device name change due to azure disk host cache setting #60344

andyzhangx commented Feb 24, 2018

device name change due to azure disk host cache setting #60344

device name change due to azure disk host cache setting #60344

Comments

andyzhangx commented Feb 24, 2018