Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close transport on shutdown #92

Merged
merged 3 commits into from
Mar 17, 2016

Conversation

abligh
Copy link
Contributor

@abligh abligh commented Mar 17, 2016

(based on branch reset-heartbeat-on-requestvote which is in an earlier PR)

When a peer shuts down, close the transport. This needs to be done in the future (i.e. when all the routines have ended), else the remove peer does not propagate correctly - this was an issue with pull request #72.

Unfortunately we cannot simply add a Close() to the Transport interface as we would break back compatibility. Therefore add a type assertion to ensure that the interface supports Close before closing it.

Add a Close() method to InmemTransport so this actually gets tested. InmemTransport has nothing much to close (though it does call DisconnectAll), but other transports may do.

Travis integration is currently failing due to failure to build
on go 1.2. Building on tip seems to work fine. Remove 1.2
(I don't know why it's failing, but 1.2 is prehistoric), and
add 1.4, 1.5, 1.6.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
@abligh abligh force-pushed the close-transport-on-shutdown branch from bbf9947 to d9a0fd2 Compare March 17, 2016 17:26
abligh added 2 commits March 17, 2016 17:26
Heartbeat timers are randomized. This means that (all other things
being equal) they may elapse and cause a node to enter candidate state
at any time. The heartbeat timer is currently reset when a candidate
receives a valid AppendEntries RPC call.

This causes an issue as a node that has just voted for a candidate
may immediately declare itself a candidate. For instance in a new
3 node cluster, Node A declares itself a candidate, Node A votes
for itself, and Node B votes for node A too. Shortly afterwards
(but before an AppendEntries heartbeat is sent by Node A), Node B's
heartbeat timer expires, and Node B starts an election.

This causes an issue within the test suite, characterised by about
1%-2% of test runs failing. In particular A is detected as the leader,
and B is subsequently detected as the leader. Issue hashicorp#90 describes this
in more detail.

The raft specification (section 5.2 of the paper, section 3.4 of the thesis),
says:

  "A server remains in follower state as long as it receives valid
  RPCs from a leader OR CANDIDATE" (my emphasis).

It is thus intended that a node in receipt of a 'valid' RequestVote
RPC from a candidate should not declare itself a candidate (but
should remain in follower state) for at least the Heartbeat interval.

It isn't entirely clear what a 'valid' RequestVote RPC means, but
certainly one that is voted on is valid, and this is the smallest
change.

This one line change thus resets the heartbeat timer if a vote
is cast in favour, and thus removes the 1%-2% of tests failing for this
reason.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
When a peer shuts down, close the transport. This needs to be done in
the future (i.e. when all the routines have ended), else the remove
peer does not propagate correctly - this was an issue with pull
request hashicorp#72.

Unfortunately we cannot simply add a Close() to the Transport
interface as we would break back compatibility. Therefore
add a type assertion to ensure that the interface supports
Close before closing it.

Add a Close() method to InmemTransport so this actually gets
tested. InmemTransport has nothing much to close (though it
does call DisconnectAll), but other transports may do.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
@abligh
Copy link
Contributor Author

abligh commented Mar 17, 2016

Rebased onto #93 in the hope that fixes travis tests.

armon added a commit that referenced this pull request Mar 17, 2016
@armon armon merged commit 3359516 into hashicorp:master Mar 17, 2016
@armon
Copy link
Member

armon commented Mar 17, 2016

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants