-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor connection with retries and backoff #78
Refactor connection with retries and backoff #78
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vladimirvivien The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
It's not obvious to me at all how this will handle the situation that the CSI driver hasn't created the This probably needs a unit test. |
46b9612
to
e62ded2
Compare
@pohly thanks for taking a look. Yeah I just added/fixed the test. If that is not enough I can add more. So I did look into the grpc code trying to understand how grpc connection behave. When block is enabled and DialContext is used, the code will wait |
/hold |
Let's move this PR to https://github.com/kubernetes-csi/node-driver-registrar since this repo is deprecated. |
🛂 ⛔️
This repo has been closed, and no new changes will be accepted. Please move your content to one of these repos. Thank you, |
Here's a renewed effort by @darkowlzz to get the gRPC code enhanced: kubernetes-csi/csi-lib-utils#8 |
This PR attempts to fix #76
It retries to connect to the driver (within the same session) several times before giving up. The logic is as follows:
apimachinery/wait.ExponentialBackoff
condition do5 times
:--connection-timeout
reachedReturn successful connection or error if one was generated
Using flag
--connection-timeout
can be used to control how long a connection request lasts before the gives up and try again. Because this will retry right away, 5 times total, flag--connection-timeout
should be set to a sensible value, something like5 to 10 secs
. For instance a5 second time out
will produce the following total attempt time:total connection attempt time ≈ 5 sec * 5 attempts * backoffFactor
if no connection is created within that time period, the code will stop.