Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn off grpc.WaitForReady #6834

Closed
jsha opened this issue Apr 19, 2023 · 0 comments · Fixed by #6850
Closed

Turn off grpc.WaitForReady #6834

jsha opened this issue Apr 19, 2023 · 0 comments · Fixed by #6850
Assignees

Comments

@jsha
Copy link
Contributor

jsha commented Apr 19, 2023

Right now we use grpc.WaitForReady(true) in all our RPCs:

WaitForReady configures the action to take when an RPC is attempted on broken connections or unreachable servers. If waitForReady is false and the connection is in the TRANSIENT_FAILURE state, the RPC will fail immediately. Otherwise, the RPC client will block the call until a connection is available (or the call is canceled or times out) and will retry the call if it fails due to a transient error. gRPC will not retry if data was written to the wire unless the server indicates it did not process the data. Please refer to https://github.com/grpc/grpc/blob/master/doc/wait-for-ready.md.

By default, RPCs don't "wait for ready".

When we restart all instances of a service simultaneously, this can bridge the gap, improving availability by returning slow answers rather than erroring because all backends are down (so long as the backends come back before the deadline). However, it also makes some errors harder to diagnose, returning timeouts instead of the underlying error that caused a connection to be bad. Since we now always do graceful rolling restarts, we should remove our setting for this and return to the default, WaitForReady(false).

@jsha jsha added this to the Sprint 2023-04-18 milestone Apr 19, 2023
@jsha jsha self-assigned this Apr 19, 2023
@jsha jsha closed this as completed in #6850 May 9, 2023
jsha added a commit that referenced this issue May 9, 2023
Currently we set WaitForReady(true), which causes gRPC requests to not
fail immediately if no backends are available, but instead wait until
the timeout in case a backend does become available. The downside is
that this behavior masks true connection errors. We'd like to turn it
off.

Fixes #6834
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants