Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: not reuse connections for peer transport #3796

Merged
merged 3 commits into from
Nov 4, 2015

Conversation

yichengq
Copy link
Contributor

@yichengq yichengq commented Nov 3, 2015

etcd uses timeout listener, and times out the accepted connections
if there is no activity. If we use idle connections in
transport, the connection may have timed out, and return error when
doing any request. So we do not reuse connections.

This fixes the problem that etcd fail to get version of peers.

fixes #3723

@xiang90
Copy link
Contributor

xiang90 commented Nov 3, 2015

Why does it work previously? The peer listener has always been time out one, no?

@yichengq
Copy link
Contributor Author

yichengq commented Nov 3, 2015

Why does it work previously?

The peer transporter was a timeout transport, which sets maxIdleConnections as -1 for itself.

The peer listener has always been time out one, no?

yes

@xiang90
Copy link
Contributor

xiang90 commented Nov 3, 2015

then why do not create a timeout transport instead? also can you please fix all the transport that works with peer url?

@yichengq
Copy link
Contributor Author

yichengq commented Nov 3, 2015

then why do not create a timeout transport instead?

it doesn't need read timeout and write timeout in connection because it is not a stream. I could create a timeout transport with no read/write timeout though.

@xiang90
Copy link
Contributor

xiang90 commented Nov 3, 2015

@yichengq

it doesn't need read timeout and write timeout in connection because it is not a stream.

Why only stream can have read/write timeout?

@yichengq
Copy link
Contributor Author

yichengq commented Nov 3, 2015

Why only stream can have read/write timeout?

The read/write timeout has the assumption that following reads/writes will happen soon.
And it is general to use request timeout or ResponseHeaderTimeout instead of read/write timeout for HTTP request IMO.

@xiang90
Copy link
Contributor

xiang90 commented Nov 3, 2015

The read/write timeout has the assumption that following reads/writes will happen soon.
And it is general to use request timeout or ResponseHeaderTimeout instead of read/write timeout for HTTP request IMO.

I do not think you are solving the root problem. The root reason is that the underlying listener for peer http is a timeout listener. It expects the transport to send/recv fast and never reuse the connection. We built the other pair of transport for it already. We really should just use it instead of creating something new with some new reasons.

Please use the correct pairs of transport/listener. Thanks.

This pairs with remote timeout listeners.

etcd uses timeout listener, and times out the accepted connections
if there is no activity. So the idle connections may time out easily.
Becaus timeout transport doesn't reuse connections, it prevents using
timeouted connection.

This fixes the problem that etcd fail to get version of peers.
@yichengq
Copy link
Contributor Author

yichengq commented Nov 3, 2015

We built the other pair of transport for it already.

ok. They are used as a pair.

So I changes to use timeoutTransport, and fixes other transport that has the same problem. PTAL.

@yichengq
Copy link
Contributor Author

yichengq commented Nov 3, 2015

Per xiang's suggestion, we want to avoid the same bug in the future.

New commit moves the code to create listener and transport for peer communication
at the same place, and use explicit functions to build them. This prevents
possible development errors in the future.

@@ -97,6 +99,28 @@ type Transporter interface {
Stop()
}

// NewPeerListener returns a listener for raft message transfer between peers.
// It uses timeout listener to identify broken streams promptly.
func NewPeerListener(u url.URL, tlsInfo transport.TLSInfo) (net.Listener, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this NewRaftListener?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

@yichengq yichengq force-pushed the fix-get-version branch 2 times, most recently from 46b251d to 62d8aef Compare November 4, 2015 18:06
@yichengq
Copy link
Contributor Author

yichengq commented Nov 4, 2015

I changed the function name to NewHTTPTransport because it returns a HTTP transport to do request targeting remote rafthttp listener.

We thought about the idea that makes the new function named NewTransport, and changes the naming of existing Transport struct. But that is a super time-consuming one, so I would like to give the new function a different name.

PTAL.

@yichengq
Copy link
Contributor Author

yichengq commented Nov 4, 2015

@xiang90 points out that NewHTTPTransport still may bring confusing considering rafthttp has Transport struct already. I am targetting to return RoundTripper instead.

It uses roundTripper instead of Transport because roundTripper is
sufficient for its requirements.
@yichengq
Copy link
Contributor Author

yichengq commented Nov 4, 2015

@xiang90 Changed it to NewRoundTripper and this eliminates confusion of duplicate Transport word. PTAL.

This moves the code to create listener and roundTripper for raft communication
to the same place, and use explicit functions to build them. This prevents
possible development errors in the future.
@yichengq
Copy link
Contributor Author

yichengq commented Nov 4, 2015

Based on discussion OOL, moved new functions into util.go and add comments about 0 read/write timeout.

@xiang90
Copy link
Contributor

xiang90 commented Nov 4, 2015

LGTM

yichengq added a commit that referenced this pull request Nov 4, 2015
etcdserver: not reuse connections for peer transport
@yichengq yichengq merged commit 3d15526 into etcd-io:master Nov 4, 2015
@yichengq yichengq deleted the fix-get-version branch November 4, 2015 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

fail to get version of peers in master branch
2 participants