Very slow recovery from partitions

Under certain conditions recovery for a follower after a partition can be very slow, what we see is that the follower is rejecting an AppendEntries call due to a a term miss-match, and the leader replication loop treats this as a failure, and so applies the back-off timer to the loop, which means the next request to the follower with the next lower index waits up to 10 seconds, this makes recovery for the follower extremely slow.

The relevant log entries show
[lp_0 is the follower]
lp_0:18:11:27.965237 [WARN] raft: Previous log term mis-match: ours: 1 remote: 3
lp_0:18:11:38.208667 [WARN] raft: Previous log term mis-match: ours: 1 remote: 3
lp_0:18:11:48.450560 [WARN] raft: Previous log term mis-match: ours: 1 remote: 3

[lp_2 is the leader]
lp_2:18:11:27.965322 [WARN] raft: AppendEntries to lp_0 rejected, sending older logs (next: 374)
lp_2:18:11:38.209203 [WARN] raft: AppendEntries to lp_0 rejected, sending older logs (next: 373)
lp_2:18:11:48.450627 [WARN] raft: AppendEntries to lp_0 rejected, sending older logs (next: 372)

you can see here the each attempt waits ~10 seconds.

One potential fix would be in (*Raft)replicateTo() to reset failures to 0 zero in this case. e.g. in https://github.com/hashicorp/raft/blob/master/replication.go line 184, change s.failures++ to s.failures = 0

Looking at the other ways appendEntries can return in this state don't appear to need the back-off on the next call.

A more complex fix would be for the AppendEntriesResponse to get a new field to indicate if the retry delay can be skipped.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very slow recovery from partitions #73

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development