-
Notifications
You must be signed in to change notification settings - Fork 166
Closed
Milestone
Description
Problem
Provisioning of GCE PDs with CMEK enable sometimes fails with disk already exists with same name
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 14s (x2 over 15s) pd.csi.storage.gke.io_gke-cluster-1-default-pool-4cede575-43h6_de91f0bc-68b9-451d-826a-43e526adc6a1 failed to provision volume with StorageClass "csi-gce-pd-cmek": rpc error: code = DeadlineExceeded desc = context deadline exceeded
Normal ExternalProvisioning 8s (x3 over 16s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "pd.csi.storage.gke.io" or manually created by system administrator
Normal Provisioning 4s (x5 over 16s) pd.csi.storage.gke.io_gke-cluster-1-default-pool-4cede575-43h6_de91f0bc-68b9-451d-826a-43e526adc6a1 External provisioner is provisioning volume for claim "default/pvc-demo"
Warning ProvisioningFailed 4s (x3 over 14s) pd.csi.storage.gke.io_gke-cluster-1-default-pool-4cede575-43h6_de91f0bc-68b9-451d-826a-43e526adc6a1 failed to provision volume with StorageClass "csi-gce-pd-cmek": rpc error: code = AlreadyExists desc = CreateVolume disk already exists with same name and is incompatible: actual disk KMS key name projects/test-project/locations/us-central1/keyRings/TestKeyRing/cryptoKeys/test-key/cryptoKeyVersions/8 did not match expected param projects/test-project/locations/us-central1/keyRings/TestKeyRing/cryptoKeys/test-key
Repro Steps
- Deploy GCE PD CSI Driver with
csi-provisionersidecar parameter--timeout=1s- This will make it easier to simulate a timeout.
- Create a StorageClass enabling CMEK encryption:
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: csi-gce-pd-cmek annotations: storageclass.kubernetes.io/is-default-class: "true" provisioner: pd.csi.storage.gke.io parameters: type: pd-standard disk-encryption-kms-key: projects/test-project/locations/us-central1/keyRings/TestKeyRing/cryptoKeys/test-key - Provision a PVC using the StorageClass above.
- It may take multiple tries to hit the timeout (but I was able to hit it on my first try once I reduced the timeout to 1sec).
Proposed Fixes
There are two fixes for this:
- Increase the timeout for the
external-provisionersidecar, this won't fix the issue, but it will reduce the likelihood of this happening.- Increase PD CSI sidecar operation timeout #542 already made this change and will be part of a future PD CSI release.
- Make sure GCE PD CSI Driver
CreateVolumecall does not fail for CMEK if operation is retried
/assign
m1n9o
Metadata
Metadata
Assignees
Labels
No labels