-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xds: Fix XDS control plane client retry timer backoff duration when connection closes after results are received #11766
Conversation
It appears this is using the connection-style backoff, which does indeed start the timer at the start of the previous attempt. https://github.com/grpc/grpc/blob/master/doc/connection-backoff.md#proposed-backoff-algorithm |
Nit: include |
And in this case the title should also say "xds client" in some fashion. The title is not clear as-is. |
…he delay calculation logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -446,6 +446,7 @@ private void handleRpcStreamClosed(Status status) { | |||
// Reset the backoff sequence if had received a response, or backoff sequence | |||
// has never been initialized. | |||
retryBackoffPolicy = backoffPolicyProvider.get(); | |||
stopwatch.reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question for my understanding, StopWatch itself is not a thread-safe class and requires external synchronization, but in the control plane client it is thread safe because we run it in the synchronization context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We rely heavily on synchronization context.
The backoffPolicy was calculating time since last attempt, but if the stream had been successfully connected and received responses it should delay from the current time (i.e. reset the stopwatch). Changed test so that stopwatch wasn't always 0, which masked the problem.