-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix consistent high timeout err rate caused by prepareStatement #892
Conversation
A more detailed description of the problem can be found in this post. |
conn.go
Outdated
@@ -713,6 +713,9 @@ func (c *Conn) prepareStatement(ctx context.Context, stmt string, tracer Tracer) | |||
if err != nil { | |||
flight.err = err | |||
flight.wg.Done() | |||
if err.Error() == "context deadline exceeded" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of doing this we should compare the value of context.ErrTimeout
, but I think in this case we can just always remove the value on errors
@Zariel Fixed |
Great thanks! |
lol, this has been plaguing us for months and I just spent the last day digging into this same bug. I was literally writing up an issue report to describe my findings and see what people thought a good fix would be... and I went to grab some links from the latest commit to the code which causes the problem, and noticed that it just got fixed. Thanks! :D |
…he#892) * fix timeout bug * remove cache no matter which err returns
Summary: Upgrading to a version which includes the fix for apache/cassandra-gocql-driver#892 which we have been hitting quite often in production. Resolves T1359093 Reviewers: min, #peloton Reviewed By: min, #peloton Subscribers: jenkins Maniphest Tasks: T1359093 Differential Revision: https://code.uberinternal.com/D1337813
Currently gocql will cache the timeout err caused by network instability when prepare a statement.
As a result, if gocql fails to prepare a statement due to network issue at the first time on a connection, the rest of the call to that statement on that connection would return "context deadline exceeded”.
It can result in high and consistent err rate and can only fixed by restart a host.
We have been suffering from this in production for a while.