Skip to content

[Raft] Documentation updates related to the Pre-vote stage implementation #2933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 15 additions & 5 deletions doc/book/replication/repl_leader_elect.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,10 +72,20 @@ the new leader finalises them automatically.

All the non-leader nodes are called *followers*. The nodes that start a new
election round are called *candidates*. The elected leader sends heartbeats to
the non-leader nodes to let them know it is alive. So if there are no heartbeats
for a period set by the :ref:`replication_timeout <cfg_replication-replication_timeout>`
option, a new election starts. Terms and votes are persisted by
each instance in order to preserve certain Raft guarantees.
the non-leader nodes to let them know it is alive.

In case there are no heartbeats for the period of :ref:`replication_timeout <cfg_replication-replication_timeout>` * 4,
a non-leader node starts a new election if the following conditions are met:

* The node has a quorum of connections to other cluster members.
* None of these cluster members can see the leader node.

.. note::

A cluster member considers the leader node to be alive if the member received heartbeats from the leader at least once during the period of ``replication_timeout * 4``,
and there are no replication errors (the connection is not broken due to timeout or due to an error).

Terms and votes are persisted by each instance to preserve certain Raft guarantees.

During the election, the nodes prefer to vote for those ones that have the
newest data. So as if an old leader managed to send something before its death
Expand Down Expand Up @@ -122,7 +132,7 @@ Configuration
Heartbeats sent by an active leader have a timeout after which a new election
starts. Heartbeats are sent once per <replication_timeout> seconds.
Default value is ``1``. The leader is considered dead if it hasn't sent any
heartbeats for the period of ``<replication_timeout> * 4``.
heartbeats for the period of ``replication_timeout * 4``.
* ``replication_synchro_quorum`` -- reuse of the :ref:`replication_synchro_quorum <cfg_replication-replication_synchro_quorum>`
option for the purpose of configuring the election quorum. The default value is ``1``,
meaning that each node becomes a leader immediately after voting for itself.
Expand Down
9 changes: 7 additions & 2 deletions doc/dev_guide/internals/box_protocol.rst
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,8 @@ The IPROTO constants that appear within requests or responses that we will descr
IPROTO_RAFT_VOTE=0x01
IPROTO_RAFT_STATE=0x02
IPROTO_RAFT_VCLOCK=0x03
IPROTO_RAFT_LEADER_ID=0x04
IPROTO_RAFT_IS_LEADER_SEEN=0x05
IPROTO_VERSION=0x54
IPROTO_FEATURES=0x55
IPROTO_TIMEOUT=0x56
Expand Down Expand Up @@ -1603,8 +1605,11 @@ In other words, there should be a full-mesh connection between the nodes.
msgpack({
IPROTO_RAFT_TERM: :samp:`{{MP_UINT unsigned integer}}`, # RAFT term of the instance
IPROTO_RAFT_VOTE: :samp:`{{MP_UINT unsigned integer}}`, # Instance vote in the current term (if any).
IPROTO_RAFT_STATE: :samp:`{{MP_UINT unsigned integer}}`, # Instance state; one of the three numbers: 1 -- follower, 2 -- candidate, 3 -- leader.
IPROTO_RAFT_VCLOCK: :samp:`{{MP_ARRAY {{MP_INT SRV_ID, MP_INT SRV_LSN}, {MP_INT SRV_ID, MP_INT SRV_LSN}, ...}}}` # Current vclock of the instance. Presents only on the instances in the "candidate" state (IPROTO_RAFT_STATE == 2).
IPROTO_RAFT_STATE: :samp:`{{MP_UINT unsigned integer}}`, # Instance state. Possible values: 1 -- follower, 2 -- candidate, 3 -- leader.
IPROTO_RAFT_VCLOCK: :samp:`{{MP_ARRAY {{MP_INT SRV_ID, MP_INT SRV_LSN}, {MP_INT SRV_ID, MP_INT SRV_LSN}, ...}}}`, # Current vclock of the instance. Presents only on the instances in the "candidate" state (IPROTO_RAFT_STATE == 2).
IPROTO_RAFT_LEADER_ID: :samp:`{{MP_UINT unsigned integer}}`, # Current leader node ID as seen by the node that issues the request. Since version :doc:`2.10.0 </release/2.10.0>`.
IPROTO_RAFT_IS_LEADER_SEEN: :samp:`{{MP_BOOL boolean}}` # Shows whether the node has a direct connection to the leader node. Since version :doc:`2.10.0 </release/2.10.0>`.

})

.. _box_protocol-illustration:
Expand Down
38 changes: 21 additions & 17 deletions doc/reference/reference_lua/box_info/election.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,25 @@ box.info.election

Since version :doc:`2.6.1 </release/2.6.1>`.
Show the current state of a replica set node in regards to
:ref:`leader election <repl_leader_elect>`, namely,
election state (mode), election term, vote in the current term,
and the leader ID of the current term.
:ref:`leader election <repl_leader_elect>`.

The following information is provided:

* ``state`` -- election state (mode) of the node. Possible values are ``leader``, ``follower``, or ``candidate``.
For more details, refer to description of the :ref:`leader election process <repl_leader_elect_process>`.
When election is enabled, the node is writable only in the ``leader`` state.

* ``term`` -- current election term.

* ``vote`` -- ID of a node the current node votes for. If the value is ``0``, it means the node hasn't voted in the current term yet.

* ``leader`` -- leader node ID in the current term. If the value is ``0``, it means the node doesn't know which node is the leader in the current term.

* ``leader_idle`` -- time in seconds since the last interaction with the known leader. Since version :doc:`2.10.0 </release/2.10.0>`.

.. note::

IDs in the ``box.info.election`` output are the replica IDs visible in the ``box.info.id`` output on each node and in the ``_cluster`` space.

**Example:**

Expand All @@ -21,20 +37,8 @@ box.info.election
tarantool> box.info.election
---
- state: follower
term: 2
vote: 0
leader: 0
term: 1
leader_idle: 0.45112800000061
...

IDs in the ``box.info.election`` output are the replica IDs visible in
the ``box.info.id`` output on each node and in the ``_cluster`` space.

State can be ``leader``, ``follower``, or ``candidate``. For more details,
refer to description of the
:ref:`leader election process <repl_leader_elect_process>`. When election is
enabled, the node is writable only in the ``leader`` state.

``vote`` equals ``0`` means the node didn't vote in the current term.

``leader`` equals ``0`` means the node doesn't know who a leader in
the current term is.