-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcd-tester: add retry logic on retriving lease info #6788
Conversation
e72e57c
to
e13f1b9
Compare
I suspect the EOF issue has nothing to do with our internal server code but it is a issue with the http library in which we are using. I think the simplest way to get around it is through retrying on client side. since connection errors are expected and client should take care of those. |
If it's errors from a disrupted connection, why not issue the RPC calls with FailFast=false? |
@heyitsanthony
|
@fanminshi ok, that's a server-side issue. There needs to be an actual error code returned by the server that can be detected here instead of dropping the error on the floor. |
@heyitsanthony I dig into the lease code a bit. I think error is thrown from this line. |
The syndrome is similar to that of http://stackoverflow.com/questions/17714494/golang-http-request-results-in-eof-errors-when-making-multiple-requests-successi. However, according to the post, that issue is patched in the go version 1.6. So we might experiencing something a bit different. |
plog.Debugf("hasLeaseExpired %v resp %v error (%v)", leaseID, resp, err) | ||
if rpctypes.Error(err) == rpctypes.ErrLeaseNotFound { | ||
return true, nil | ||
for ctx.Err() == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this for condition seems strange. how about checking the ctx.Err at the end of the loop instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also i think we only need to retry on the keepalive path, not here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiang90 I wonder which keepalive path are you talking about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry. my mistake. i was confused.
accdb3b
to
c0a7c00
Compare
@@ -54,14 +54,14 @@ func (hc *hashChecker) checkRevAndHashes() (err error) { | |||
for i := 0; i < retries; i++ { | |||
revs, hashes, err = hc.hrg.getRevisionHash() | |||
if err != nil { | |||
plog.Warningf("retry %i. failed to retrieve revison and hash (%v)", i, err) | |||
plog.Warningf("retry %d. failed to retrieve revison and hash (%v)", i, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we fix this in another pr? thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. will do
return true, nil | ||
// keep retrying until lease's state is known or ctx is being canceled | ||
for ctx.Err() == nil { | ||
resp, err := ls.getLeaseByID(ctx, leaseID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the error is not expected, right? probably we should print out warning here if there is an error other than lease not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, there shouldn't be error the first place. a warning makes sense.
Let's clean this up. Then lgtm |
c0a7c00
to
4c0c2bf
Compare
// keep retrying until lease's state is known or ctx is being canceled | ||
for ctx.Err() == nil { | ||
resp, err := ls.getLeaseByID(ctx, leaseID) | ||
plog.Warningf("hasLeaseExpired %v resp %v error (%v)", leaseID, resp, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this line after 270. we do not want to print it out if:
- there is no error
- the error is not found
getting lease and keys info through raw rpcs rarely experience error such as EOF. This is considered as a failure and causes tester to clean up. however, they are just transient problem with temporary connection issue which should not be considered as a testing failure. so we add retry logic in case of transient failure. FIX etcd-io#6754
4c0c2bf
to
649fe7f
Compare
All fixed. PTAL @xiang90 |
@fanminshi LGTM. Thanks a lot! |
getting lease and keys info through raw rpcs rarely experience error such as EOF. This is considered as a failure and causes tester to clean up.
however, they are just transient problem with temporary connection issue which should not br considered as a testing failure. so we add retry logic in case of transient failure.
FIX #6754