Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: translate EOF to ErrNoLeader for renew, timetolive #6791

Merged
merged 1 commit into from
Nov 3, 2016

Conversation

gyuho
Copy link
Contributor

@gyuho gyuho commented Nov 2, 2016

Address #6754.

In case there are network errors or unexpected EOF errors
in TimeToLive http requests to leader, we translate that into
ErrNoLeader, and expects the client to retry its request.

/cc @xiang90 @heyitsanthony @fanminshi

@gyuho gyuho force-pushed the grpc-leader branch 4 times, most recently from b1506fe to 8e41ab9 Compare November 2, 2016 21:52
@heyitsanthony
Copy link
Contributor

lgtm

if err != nil {
// follower cannot reach leader at the moment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove todo at line 185?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Merging in greens. Thanks.

@xiang90
Copy link
Contributor

xiang90 commented Nov 2, 2016

LGTM

if err != nil {
// follower cannot reach leader at the moment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message we got in the tester is (rpc error: code = 2 desc = Post http://10.240.0.3:2380/leases/internal: EOF). is Post http://10.240.0.3:2380/leases/internal: EOF return by http call equals to either io.EOF or io.ErrUnexpectedEOF?

// possibly from network errors
// translate to ErrNoLeader, expecting clients to retry
if err == io.EOF || err == io.ErrUnexpectedEOF {
err = rpctypes.ErrNoLeader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we return grpc error ErrGRPCNoLeader instead of client side error rpctypes.ErrNoLeader

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to convert the internal error to grpc error for lease in the API layer.
https://github.com/coreos/etcd/blob/master/etcdserver/api/v3rpc/lease.go#L57

@gyuho
Copy link
Contributor Author

gyuho commented Nov 2, 2016

@xiang90 @fanminshi PTAL.

I added another commit to match ErrNoLeader in togRPCError

So https://github.com/coreos/etcd/blob/master/etcdserver/api/v3rpc/lease.go#L57 would return rpctypes.ErrNoLeader

func (ls *LeaseServer) LeaseTimeToLive(ctx context.Context, rr *pb.LeaseTimeToLiveRequest) (*pb.LeaseTimeToLiveResponse, error) {
    resp, err := ls.le.LeaseTimeToLive(ctx, rr)
    if err != nil {
        return nil, togRPCError(err)
    }

And now togRPCError would translate that into rpctypes.ErrGRPCNoLeader.

So tester would have to implement retrial logic to error rpctypes.ErrGRPCNoLeader.

@@ -49,7 +49,7 @@ func togRPCError(err error) error {
case etcdserver.ErrTooManyRequests:
return rpctypes.ErrTooManyRequests

case etcdserver.ErrNoLeader:
case etcdserver.ErrNoLeader, rpctypes.ErrNoLeader: // ErrNoLeader from leasehttp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really goofy; maybe lessor.LeaseTimeToLive should still return io.EOF and v3api.LeaseTimeToLive will translate it to etcdserver.ErrNoLeader?

Copy link
Contributor

@xiang90 xiang90 Nov 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... seems bad. Now API layer needs to know the details of the lease implementation. It now start to understand io related errors. I feel we should still put the error convert back to the HTTP handler itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The api layer already calls out to a package called leasehttp, so it already has some idea of the lease implementation. I think it's OK for the API layer to know the http request may return an EOF instead of having the leasehttp package make claims about etcd leadership.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, OK, it's hidden through lessor, but it's still a bald error type. Putting the rpctypes error in the case still seems like it's violating some assumed design principle since it sticks out as different. Likewise, leasehttp has to import another package to get at the error . Perhaps what belongs here is a new error like lease.ErrLessorUnavailable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heyitsanthony

https://github.com/coreos/etcd/blob/master/etcdserver/v3_server.go#L325

I think this is the place to convert the error, right? We do not need the ErrLessorUnavailable.

Copy link
Contributor

@heyitsanthony heyitsanthony Nov 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that looks like the right place to do the EOF check / conversion.

@gyuho gyuho changed the title lease/leasehttp: translate EOF to ErrNoLeader for clients v3rpc: translate EOF to ErrNoLeader for renew, timetolive Nov 2, 2016
@gyuho
Copy link
Contributor Author

gyuho commented Nov 2, 2016

Just fixed to handle EOF directly from v3rpc as Anthony suggested. It's cleaner now. Thanks!

@@ -86,7 +86,7 @@ func (ls *LeaseServer) LeaseKeepAlive(stream pb.Lease_LeaseKeepAliveServer) erro
}

if err != nil {
return err
return handleLeaseEOF(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also check EOF for other lease raw RPCs?

Copy link
Contributor Author

@gyuho gyuho Nov 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other RPCs should be handled by gRPC, I think. These are only needed for RPCs with raw HTTP requests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keepAlive uses leaseRenew. i think we might want to catch EOF at that function.
https://github.com/coreos/etcd/blob/master/etcdserver/api/v3rpc/lease.go#L82

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! i didn't recognize from the PR.

@fanminshi
Copy link
Member

LGTM

// lease renew, timetolive requests to followers are forwarded to leader,
// and follower might not be able to reach leader from network errors with EOFs.
// Then translate that into ErrNoLeader, expect the client to retry its request.
func handleLeaseEOF(err error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really not lease specific error. handleLeaseEOF is confusing i think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Just fixed.

@gyuho gyuho force-pushed the grpc-leader branch 2 times, most recently from 1620693 to c2f007a Compare November 3, 2016 04:37
@gyuho gyuho changed the title v3rpc: translate EOF to ErrNoLeader for renew, timetolive etcdserver: translate EOF to ErrNoLeader for renew, timetolive Nov 3, 2016
@gyuho
Copy link
Contributor Author

gyuho commented Nov 3, 2016

@xiang90 @heyitsanthony All fixed. PTAL. Thanks!

// lease renew, timetolive requests to followers are forwarded to leader,
// and follower might not be able to reach leader from network errors with EOFs.
// Then translate that into ErrNoLeader, expect the client to retry its request.
func handleEOF(err error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// convertEOFToNoLeader convert ... to ... since ...
covertEOFErrortoNoLeader

@xiang90
Copy link
Contributor

xiang90 commented Nov 3, 2016

LGTM after fixing the func name and its comments.

Address etcd-io#6754.

In case there are network errors or unexpected EOF errors
in TimeToLive http requests to leader, we translate that into
ErrNoLeader, and expects the client to retry its request.
@gyuho
Copy link
Contributor Author

gyuho commented Nov 3, 2016

Fixed. Merging in greens. Thanks all.

@gyuho gyuho merged commit bbc1cda into etcd-io:master Nov 3, 2016
@gyuho gyuho deleted the grpc-leader branch November 3, 2016 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants