Skip to content

Very slow recovery from partitions #73

Closed
@superfell

Description

Under certain conditions recovery for a follower after a partition can be very slow, what we see is that the follower is rejecting an AppendEntries call due to a a term miss-match, and the leader replication loop treats this as a failure, and so applies the back-off timer to the loop, which means the next request to the follower with the next lower index waits up to 10 seconds, this makes recovery for the follower extremely slow.

The relevant log entries show
[lp_0 is the follower]
lp_0:18:11:27.965237 [WARN] raft: Previous log term mis-match: ours: 1 remote: 3
lp_0:18:11:38.208667 [WARN] raft: Previous log term mis-match: ours: 1 remote: 3
lp_0:18:11:48.450560 [WARN] raft: Previous log term mis-match: ours: 1 remote: 3

[lp_2 is the leader]
lp_2:18:11:27.965322 [WARN] raft: AppendEntries to lp_0 rejected, sending older logs (next: 374)
lp_2:18:11:38.209203 [WARN] raft: AppendEntries to lp_0 rejected, sending older logs (next: 373)
lp_2:18:11:48.450627 [WARN] raft: AppendEntries to lp_0 rejected, sending older logs (next: 372)

you can see here the each attempt waits ~10 seconds.

One potential fix would be in (*Raft)replicateTo() to reset failures to 0 zero in this case. e.g. in https://github.com/hashicorp/raft/blob/master/replication.go line 184, change s.failures++ to s.failures = 0

Looking at the other ways appendEntries can return in this state don't appear to need the back-off on the next call.

A more complex fix would be for the AppendEntriesResponse to get a new field to indicate if the retry delay can be skipped.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions