Description
Describe the feature you'd like to have
Provide a way to configure the timeout for the ceph. Get API calls to avoid command stuck if there is some problem between the ceph cluster and csi driver (cluster health, slow ops, or short network connectivity problem)
What is the value to the end user? (why is it a priority?)
Currently, if ceph doesn't responds to any CSI calls the cephcsi will start throwing an operation already exists
error message even if the ceph cluster is recovered, the only way to recover the csi driver is to restart the csi pods. Restarting csi driver pods is not an optimal solution in most of the production clusters.
How would the end user gain value from having this feature?
Avoid restarting csi pods in the production clusters even if any GET API call is stuck. The ask is to add timeout only to GET API calls not for any other operations to avoid stale resources in the cluster.
But again we need to consider all the corner cases carefully before doing this change.