-
Notifications
You must be signed in to change notification settings - Fork 149
Description
Summary:
We have a setup in which the external-resizer is used with the storage provider that only supports offline expansion (e.g., only supports PluginCapability_VolumeExpansion_OFFLINE). We deployed a job that uses a PVC provisioned by the storage provider. While the job pod is running, we resize the PVC by modifying spec.resources.requests.storage. The PVC cannot be resized while the pod is running as expected. However, after the job pod is completed, the PVC still doesn't get resized. external-resizerdoesn't send resizing gRPC call to the storage provider. The PVC is stuck in this state forever until we manually delete the job pod.
Reproduce steps:
-
Deploy
external-resizertogether with a storage provider (we use Longhorn) -
Don't set the
--handle-volume-inuse-errorflag for theexternal-resizer. It means that by default,external-resizerwill handle handle volume in use error in resizer controller, link -
Deploy a job that uses a PVC as below. The job creates a pod that will sleep for 2 minutes and complete.
Click to open
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: test-job-pvc namespace: default spec: accessModes: - ReadWriteOnce storageClassName: longhorn resources: requests: storage: 1Gi --- apiVersion: batch/v1 kind: Job metadata: name: test-job namespace: default spec: backoffLimit: 1 template: metadata: name: test-job spec: containers: - name: test-job image: ubuntu:latest imagePullPolicy: IfNotPresent securityContext: privileged: true command: ["/bin/sh"] args: ["-c", "echo 'sleep for 120s then exit'; sleep 120"] volumeMounts: - mountPath: /data name: vol restartPolicy: OnFailure volumes: - name: vol persistentVolumeClaim: claimName: test-job-pvc -
While the job pod become running, try to expand the PVC by editing the
spec.resources.requests.storage -
Observe that the resizing fail
-
Wait for the job pod to become completed.
-
Observer that that PVC stuck in the current state forever. It doesn't get resized because
external-resizerdoesn't attempt to make gRPC expanding call to the storage provider.
Expected Behavior:
Once the job pod is completed, the PVC is no longer consider to be in-used. Therefore external-resizer should attempt to make gRPC expanding call to the storage provider.
Propose:
We dig into the source code see that:
- This checker prevent the
external-resizerfrom retrying if the PVC has InUseErrors before AND it is in thectrl.usedPVCsmap - The problem is that the PVC is never removed from the
ctrl.usedPVCsmap when a pod move tocompletedphase. PVC is only removed when the pod is deleted, link - We think that the logic over here should be changed to handle the case when the pod become
completed. I.e.,:func (ctrl *resizeController) updatePod(oldObj, newObj interface{}) { pod := parsePod(newObj) if pod == nil { return } if isPodTerminated(pod) { ctrl.usedPVCs.removePod(pod) } else { ctrl.usedPVCs.addPod(pod) } }
Evn:
external-resizerv1.2.0- Longhorn v1.2.2