Description
Epic: #2592
Product: Tarantool
Since: 2.10.0
Dev issue: tarantool/tarantool#6654
Root document: https://www.tarantool.io/en/doc/latest/book/replication/repl_leader_elect/#leader-election-process
SME: @ sergepetrenko
Details
[input from dev issue]
When a server gets partitioned from the majority of the cluster, it starts incrementing its term every election_timeout
, trying to win elections. Once such a server reunites with the cluster it unintentionally disrupts the current working leader. The leader steps off seeing a greater term number. This makes the whole cluster go through at least one round of elections, rendering it read-only for a couple of seconds.
Diego Ongaro's thesis covers this issue and suggests adding a Pre-Vote stage to Raft
TDB details of the implementation of the Pre-Vote stage in Tarantool
ToDo
- Update the description of the leader election process starting from this sentence: "So if there are no heartbeats for a period set by the replication_timeout option, a new election starts." The process will be different with the Pre-Vote stage