You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have the EFS CSI driver installed with dynamic provisioning and a reclaim policy of Delete.
A flood of requests came in, which caused the CSI controller to eventually fail its health check and get restarted. After restart, the controller kept trying to provision APs for new PVCs, but they kept failing with AccessPointAlreadyExists. The controller kept retrying and it kept failing. Eventually the PVCs were deleted, and the APs were leaked. Most likely, the APs were created but never recorded as being provisioned in K8s, thus causing the controller on restart to keep trying to recreate them.
What you expected to happen?
Upon restart, the controller should recognize that the Access Point was already created and "adopt" it. Alternatively, the controller should recognize that this isn't a retriable error and not retry. Finally, when the PVC is deleted, the AP should be deleted according to the reclaim policy. This shouldn't require enabling reuseAccessPoint.
How to reproduce it (as minimally and precisely as possible)?
Hi Joel, thanks for opening this issue. The team is looking into it. In the meantime, can you please provide any debug logs you might have so that we can better understand the issue? Thanks!
/kind bug
What happened?
We have the EFS CSI driver installed with dynamic provisioning and a reclaim policy of Delete.
A flood of requests came in, which caused the CSI controller to eventually fail its health check and get restarted. After restart, the controller kept trying to provision APs for new PVCs, but they kept failing with
AccessPointAlreadyExists
. The controller kept retrying and it kept failing. Eventually the PVCs were deleted, and the APs were leaked. Most likely, the APs were created but never recorded as being provisioned in K8s, thus causing the controller on restart to keep trying to recreate them.What you expected to happen?
Upon restart, the controller should recognize that the Access Point was already created and "adopt" it. Alternatively, the controller should recognize that this isn't a retriable error and not retry. Finally, when the PVC is deleted, the AP should be deleted according to the reclaim policy. This shouldn't require enabling
reuseAccessPoint
.How to reproduce it (as minimally and precisely as possible)?
Anything else we need to know?:
This code here:
aws-efs-csi-driver/pkg/cloud/cloud.go
Lines 190 to 195 in fe845cc
doesn't check to see if the error code is
AccessPointAlreadyExists
in which case it should returnErrAlreadyExists
and thus the code path inaws-efs-csi-driver/pkg/driver/controller.go
Lines 346 to 348 in fe845cc
Environment
kubectl version
): 1.29Please also attach debug logs to help us better diagnose
The text was updated successfully, but these errors were encountered: