-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdserver: translate EOF to ErrNoLeader for renew, timetolive #6791
Conversation
b1506fe
to
8e41ab9
Compare
lgtm |
if err != nil { | ||
// follower cannot reach leader at the moment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove todo at line 185?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Merging in greens. Thanks.
LGTM |
if err != nil { | ||
// follower cannot reach leader at the moment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message we got in the tester is (rpc error: code = 2 desc = Post http://10.240.0.3:2380/leases/internal: EOF)
. is Post http://10.240.0.3:2380/leases/internal: EOF
return by http call equals to either io.EOF or io.ErrUnexpectedEOF?
// possibly from network errors | ||
// translate to ErrNoLeader, expecting clients to retry | ||
if err == io.EOF || err == io.ErrUnexpectedEOF { | ||
err = rpctypes.ErrNoLeader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we return grpc error ErrGRPCNoLeader
instead of client side error rpctypes.ErrNoLeader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need to convert the internal error to grpc error for lease in the API layer.
https://github.com/coreos/etcd/blob/master/etcdserver/api/v3rpc/lease.go#L57
@xiang90 @fanminshi PTAL. I added another commit to match ErrNoLeader in So https://github.com/coreos/etcd/blob/master/etcdserver/api/v3rpc/lease.go#L57 would return func (ls *LeaseServer) LeaseTimeToLive(ctx context.Context, rr *pb.LeaseTimeToLiveRequest) (*pb.LeaseTimeToLiveResponse, error) {
resp, err := ls.le.LeaseTimeToLive(ctx, rr)
if err != nil {
return nil, togRPCError(err)
} And now So tester would have to implement retrial logic to error |
@@ -49,7 +49,7 @@ func togRPCError(err error) error { | |||
case etcdserver.ErrTooManyRequests: | |||
return rpctypes.ErrTooManyRequests | |||
|
|||
case etcdserver.ErrNoLeader: | |||
case etcdserver.ErrNoLeader, rpctypes.ErrNoLeader: // ErrNoLeader from leasehttp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really goofy; maybe lessor.LeaseTimeToLive
should still return io.EOF and v3api.LeaseTimeToLive will translate it to etcdserver.ErrNoLeader
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... seems bad. Now API layer needs to know the details of the lease implementation. It now start to understand io related errors. I feel we should still put the error convert back to the HTTP handler itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @heyitsanthony @gyuho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The api layer already calls out to a package called leasehttp
, so it already has some idea of the lease implementation. I think it's OK for the API layer to know the http request may return an EOF instead of having the leasehttp package make claims about etcd leadership.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does it call leasehttp.X in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, OK, it's hidden through lessor, but it's still a bald error type. Putting the rpctypes error in the case still seems like it's violating some assumed design principle since it sticks out as different. Likewise, leasehttp
has to import another package to get at the error . Perhaps what belongs here is a new error like lease.ErrLessorUnavailable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/coreos/etcd/blob/master/etcdserver/v3_server.go#L325
I think this is the place to convert the error, right? We do not need the ErrLessorUnavailable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that looks like the right place to do the EOF check / conversion.
Just fixed to handle EOF directly from |
@@ -86,7 +86,7 @@ func (ls *LeaseServer) LeaseKeepAlive(stream pb.Lease_LeaseKeepAliveServer) erro | |||
} | |||
|
|||
if err != nil { | |||
return err | |||
return handleLeaseEOF(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also check EOF for other lease raw RPCs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other RPCs should be handled by gRPC, I think. These are only needed for RPCs with raw HTTP requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keepAlive uses leaseRenew. i think we might want to catch EOF at that function.
https://github.com/coreos/etcd/blob/master/etcdserver/api/v3rpc/lease.go#L82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this PR catches it here https://github.com/coreos/etcd/blob/master/etcdserver/api/v3rpc/lease.go#L89
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! i didn't recognize from the PR.
LGTM |
// lease renew, timetolive requests to followers are forwarded to leader, | ||
// and follower might not be able to reach leader from network errors with EOFs. | ||
// Then translate that into ErrNoLeader, expect the client to retry its request. | ||
func handleLeaseEOF(err error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really not lease specific error. handleLeaseEOF is confusing i think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Just fixed.
1620693
to
c2f007a
Compare
@xiang90 @heyitsanthony All fixed. PTAL. Thanks! |
// lease renew, timetolive requests to followers are forwarded to leader, | ||
// and follower might not be able to reach leader from network errors with EOFs. | ||
// Then translate that into ErrNoLeader, expect the client to retry its request. | ||
func handleEOF(err error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// convertEOFToNoLeader convert ... to ... since ...
covertEOFErrortoNoLeader
LGTM after fixing the func name and its comments. |
Address etcd-io#6754. In case there are network errors or unexpected EOF errors in TimeToLive http requests to leader, we translate that into ErrNoLeader, and expects the client to retry its request.
Fixed. Merging in greens. Thanks all. |
Address #6754.
In case there are network errors or unexpected EOF errors
in TimeToLive http requests to leader, we translate that into
ErrNoLeader, and expects the client to retry its request.
/cc @xiang90 @heyitsanthony @fanminshi