Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#148 Replaced Master and Slave with Primary and Replica where applicable #177

Merged
merged 2 commits into from
Oct 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion commands/bitop.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,4 +56,4 @@ Care should be taken when running it against long input strings.

For real-time metrics and statistics involving large inputs a good approach is
to use a replica (with replica-read-only option enabled) where the bit-wise
operations are performed to avoid blocking the master instance.
operations are performed to avoid blocking the primary instance.
2 changes: 1 addition & 1 deletion commands/client-list.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ c: connection to be closed after writing entire reply
d: a watched keys has been modified - EXEC will fail
e: the client is excluded from the client eviction mechanism
i: the client is waiting for a VM I/O (deprecated)
M: the client is a master
M: the client is a primary
madolson marked this conversation as resolved.
Show resolved Hide resolved
N: no specific flag set
O: the client is a client in MONITOR mode
P: the client is a Pub/Sub subscriber
Expand Down
8 changes: 4 additions & 4 deletions commands/client-pause.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ For the `WRITE` mode, some commands have special behavior:
This command is useful as it makes able to switch clients from a Valkey instance to another one in a controlled way. For example during an instance upgrade the system administrator could do the following:

* Pause the clients using `CLIENT PAUSE`
* Wait a few seconds to make sure the replicas processed the latest replication stream from the master.
* Turn one of the replicas into a master.
* Reconfigure clients to connect with the new master.
* Wait a few seconds to make sure the replicas processed the latest replication stream from the primary.
* Turn one of the replicas into a primary.
* Reconfigure clients to connect with the new primary.

The recommended mode for client pause is `WRITE`. This mode will stop all replication traffic, can be
aborted with the `CLIENT UNPAUSE` command, and allows reconfiguring the old master without risking accepting writes after the
aborted with the `CLIENT UNPAUSE` command, and allows reconfiguring the old primary without risking accepting writes after the
failover. This is also the mode used during cluster failover.

This command also prevents keys to be evicted or expired during the time clients are paused.
Expand Down
2 changes: 1 addition & 1 deletion commands/cluster-addslots.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ are already assigned:
This command only works in cluster mode and is useful in the following
Valkey Cluster operations:

1. To create a new `cluster ADDSLOTS` is used in order to initially setup master nodes splitting the available hash slots among them.
1. To create a new `cluster ADDSLOTS` is used in order to initially setup primary nodes splitting the available hash slots among them.
2. In order to fix a broken cluster where certain slots are unassigned.

## Information about slots propagation and warnings
Expand Down
2 changes: 1 addition & 1 deletion commands/cluster-addslotsrange.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,5 @@ The same operation can be completed with the following `CLUSTER ADDSLOTSRANGE` c

This command only works in cluster mode and is useful in the following Valkey Cluster operations:

1. To create a new cluster, `CLUSTER ADDSLOTSRANGE` is used to initially set up master nodes splitting the available hash slots among them.
1. To create a new cluster, `CLUSTER ADDSLOTSRANGE` is used to initially set up primary nodes splitting the available hash slots among them.
2. In order to fix a broken cluster where certain slots are unassigned.
4 changes: 2 additions & 2 deletions commands/cluster-count-failure-reports.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
The command returns the number of *failure reports* for the specified node.
Failure reports are the way Valkey Cluster uses in order to promote a
`PFAIL` state, that means a node is not reachable, to a `FAIL` state,
that means that the majority of masters in the cluster agreed within
that means that the majority of primaries in the cluster agreed within
a window of time that the node is not reachable.

A few more details:
Expand All @@ -10,7 +10,7 @@ A few more details:
* Nodes in `PFAIL` state are provided in gossip sections of heartbeat packets.
* Every time a node processes gossip packets from other nodes, it creates (and refreshes the TTL if needed) **failure reports**, remembering that a given node said another given node is in `PFAIL` condition.
* Each failure report has a time to live of two times the *node timeout* time.
* If at a given time a node has another node flagged with `PFAIL`, and at the same time collected the majority of other master nodes *failure reports* about this node (including itself if it is a master), then it elevates the failure state of the node from `PFAIL` to `FAIL`, and broadcasts a message forcing all the nodes that can be reached to flag the node as `FAIL`.
* If at a given time a node has another node flagged with `PFAIL`, and at the same time collected the majority of other primary nodes *failure reports* about this node (including itself if it is a primary), then it elevates the failure state of the node from `PFAIL` to `FAIL`, and broadcasts a message forcing all the nodes that can be reached to flag the node as `FAIL`.

This command returns the number of failure reports for the current node which are currently not expired (so received within two times the *node timeout* time). The count does not include what the node we are asking this count believes about the node ID we pass as argument, the count *only* includes the failure reports the node received from other nodes.

Expand Down
4 changes: 2 additions & 2 deletions commands/cluster-delslots.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
In Valkey Cluster, each node keeps track of which master is serving
In Valkey Cluster, each node keeps track of which primary is serving
a particular hash slot.

The `CLUSTER DELSLOTS` command asks a particular Valkey Cluster node to
forget which master is serving the hash slots specified as arguments.
forget which primary is serving the hash slots specified as arguments.

In the context of a node that has received a `CLUSTER DELSLOTS` command and
has consequently removed the associations for the passed hash slots,
Expand Down
42 changes: 21 additions & 21 deletions commands/cluster-failover.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,48 @@
This command, that can only be sent to a Valkey Cluster replica node, forces
the replica to start a manual failover of its master instance.
the replica to start a manual failover of its primary instance.

A manual failover is a special kind of failover that is usually executed when
there are no actual failures, but we wish to swap the current master with one
there are no actual failures, but we wish to swap the current primary with one
of its replicas (which is the node we send the command to), in a safe way,
without any window for data loss. It works in the following way:

1. The replica tells the master to stop processing queries from clients.
2. The master replies to the replica with the current *replication offset*.
3. The replica waits for the replication offset to match on its side, to make sure it processed all the data from the master before it continues.
4. The replica starts a failover, obtains a new configuration epoch from the majority of the masters, and broadcasts the new configuration.
5. The old master receives the configuration update: unblocks its clients and starts replying with redirection messages so that they'll continue the chat with the new master.
1. The replica tells the primary to stop processing queries from clients.
2. The primary replies to the replica with the current *replication offset*.
3. The replica waits for the replication offset to match on its side, to make sure it processed all the data from the primary before it continues.
4. The replica starts a failover, obtains a new configuration epoch from the majority of the primaries, and broadcasts the new configuration.
5. The old primary receives the configuration update: unblocks its clients and starts replying with redirection messages so that they'll continue the chat with the new primary.

This way clients are moved away from the old master to the new master
atomically and only when the replica that is turning into the new master
has processed all of the replication stream from the old master.
This way clients are moved away from the old primary to the new primary
atomically and only when the replica that is turning into the new primary
has processed all of the replication stream from the old primary.

## FORCE option: manual failover when the master is down
## FORCE option: manual failover when the primary is down

The command behavior can be modified by two options: **FORCE** and **TAKEOVER**.

If the **FORCE** option is given, the replica does not perform any handshake
with the master, that may be not reachable, but instead just starts a
with the primary, that may be not reachable, but instead just starts a
failover ASAP starting from point 4. This is useful when we want to start
a manual failover while the master is no longer reachable.
a manual failover while the primary is no longer reachable.

However using **FORCE** we still need the majority of masters to be available
However using **FORCE** we still need the majority of primaries to be available
in order to authorize the failover and generate a new configuration epoch
for the replica that is going to become master.
for the replica that is going to become primary.

## TAKEOVER option: manual failover without cluster consensus

There are situations where this is not enough, and we want a replica to failover
without any agreement with the rest of the cluster. A real world use case
for this is to mass promote replicas in a different data center to masters
in order to perform a data center switch, while all the masters are down
for this is to mass promote replicas in a different data center to primaries
in order to perform a data center switch, while all the primaries are down
or partitioned away.

The **TAKEOVER** option implies everything **FORCE** implies, but also does
not uses any cluster authorization in order to failover. A replica receiving
`CLUSTER FAILOVER TAKEOVER` will instead:

1. Generate a new `configEpoch` unilaterally, just taking the current greatest epoch available and incrementing it if its local configuration epoch is not already the greatest.
2. Assign itself all the hash slots of its master, and propagate the new configuration to every node which is reachable ASAP, and eventually to every other node.
2. Assign itself all the hash slots of its primary, and propagate the new configuration to every node which is reachable ASAP, and eventually to every other node.

Note that **TAKEOVER violates the last-failover-wins principle** of Valkey Cluster, since the configuration epoch generated by the replica violates the normal generation of configuration epochs in several ways:

Expand All @@ -56,8 +56,8 @@ Because of this the **TAKEOVER** option should be used with care.
* `CLUSTER FAILOVER`, unless the **TAKEOVER** option is specified, does not execute a failover synchronously.
It only *schedules* a manual failover, bypassing the failure detection stage.
* An `OK` reply is no guarantee that the failover will succeed.
* A replica can only be promoted to a master if it is known as a replica by a majority of the masters in the cluster.
If the replica is a new node that has just been added to the cluster (for example after upgrading it), it may not yet be known to all the masters in the cluster.
To check that the masters are aware of a new replica, you can send `CLUSTER NODES` or `CLUSTER REPLICAS` to each of the master nodes and check that it appears as a replica, before sending `CLUSTER FAILOVER` to the replica.
* A replica can only be promoted to a primary if it is known as a replica by a majority of the primaries in the cluster.
If the replica is a new node that has just been added to the cluster (for example after upgrading it), it may not yet be known to all the primaries in the cluster.
To check that the primaries are aware of a new replica, you can send `CLUSTER NODES` or `CLUSTER REPLICAS` to each of the primary nodes and check that it appears as a replica, before sending `CLUSTER FAILOVER` to the replica.
* To check that the failover has actually happened you can use `ROLE`, `INFO REPLICATION` (which indicates "role:master" after successful failover), or `CLUSTER NODES` to verify that the state of the cluster has changed sometime after the command was sent.
* To check if the failover has failed, check the replica's log for "Manual failover timed out", which is logged if the replica has given up after a few seconds.
4 changes: 2 additions & 2 deletions commands/cluster-forget.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ node receiving the command.
Because when a given node is part of the cluster, all the other nodes
participating in the cluster knows about it, in order for a node to be
completely removed from a cluster, the `CLUSTER FORGET` command must be
sent to all the remaining nodes, regardless of the fact they are masters
sent to all the remaining nodes, regardless of the fact they are primaries
or replicas.

However the command cannot simply drop the node from the internal node
Expand Down Expand Up @@ -49,5 +49,5 @@ we want to remove a node.
The command does not succeed and returns an error in the following cases:

1. The specified node ID is not found in the nodes table.
2. The node receiving the command is a replica, and the specified node ID identifies its current master.
2. The node receiving the command is a replica, and the specified node ID identifies its current primary.
3. The node ID identifies the same node we are sending the command to.
6 changes: 3 additions & 3 deletions commands/cluster-info.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ cluster_stats_messages_received:1483968
total_cluster_links_buffer_limit_exceeded:0
```

* `cluster_state`: State is `ok` if the node is able to receive queries. `fail` if there is at least one hash slot which is unbound (no node associated), in error state (node serving it is flagged with FAIL flag), or if the majority of masters can't be reached by this node.
* `cluster_state`: State is `ok` if the node is able to receive queries. `fail` if there is at least one hash slot which is unbound (no node associated), in error state (node serving it is flagged with FAIL flag), or if the majority of primaries can't be reached by this node.
* `cluster_slots_assigned`: Number of slots which are associated to some node (not unbound). This number should be 16384 for the node to work properly, which means that each hash slot should be mapped to a node.
* `cluster_slots_ok`: Number of hash slots mapping to a node not in `FAIL` or `PFAIL` state.
* `cluster_slots_pfail`: Number of hash slots mapping to a node in `PFAIL` state. Note that those hash slots still work correctly, as long as the `PFAIL` state is not promoted to `FAIL` by the failure detection algorithm. `PFAIL` only means that we are currently not able to talk with the node, but may be just a transient error.
* `cluster_slots_fail`: Number of hash slots mapping to a node in `FAIL` state. If this number is not zero the node is not able to serve queries unless `cluster-require-full-coverage` is set to `no` in the configuration.
* `cluster_known_nodes`: The total number of known nodes in the cluster, including nodes in `HANDSHAKE` state that may not currently be proper members of the cluster.
* `cluster_size`: The number of master nodes serving at least one hash slot in the cluster.
* `cluster_size`: The number of primary nodes serving at least one hash slot in the cluster.
* `cluster_current_epoch`: The local `Current Epoch` variable. This is used in order to create unique increasing version numbers during fail overs.
* `cluster_my_epoch`: The `Config Epoch` of the node we are talking with. This is the current configuration version assigned to this node.
* `cluster_stats_messages_sent`: Number of messages sent via the cluster node-to-node binary bus.
Expand All @@ -38,7 +38,7 @@ Here are the explanation of these fields:
* `cluster_stats_messages_meet_sent` and `cluster_stats_messages_meet_received`: Handshake message sent to a new node, either through gossip or `CLUSTER MEET`.
* `cluster_stats_messages_fail_sent` and `cluster_stats_messages_fail_received`: Mark node xxx as failing.
* `cluster_stats_messages_publish_sent` and `cluster_stats_messages_publish_received`: Pub/Sub Publish propagation, see [Pubsub](../topics/pubsub.md#pubsub).
* `cluster_stats_messages_auth-req_sent` and `cluster_stats_messages_auth-req_received`: Replica initiated leader election to replace its master.
* `cluster_stats_messages_auth-req_sent` and `cluster_stats_messages_auth-req_received`: Replica initiated leader election to replace its primary.
* `cluster_stats_messages_auth-ack_sent` and `cluster_stats_messages_auth-ack_received`: Message indicating a vote during leader election.
* `cluster_stats_messages_update_sent` and `cluster_stats_messages_update_received`: Another node slots configuration.
* `cluster_stats_messages_mfstart_sent` and `cluster_stats_messages_mfstart_received`: Pause clients for manual failover.
Expand Down
8 changes: 4 additions & 4 deletions commands/cluster-replicas.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
The command provides a list of replica nodes replicating from the specified
master node. The list is provided in the same format used by `CLUSTER NODES` (please refer to its documentation for the specification of the format).
primary node. The list is provided in the same format used by `CLUSTER NODES` (please refer to its documentation for the specification of the format).

The command will fail if the specified node is not known or if it is not
a master according to the node table of the node receiving the command.
a primary according to the node table of the node receiving the command.

Note that if a replica is added, moved, or removed from a given master node,
Note that if a replica is added, moved, or removed from a given primary node,
and we ask `CLUSTER REPLICAS` to a node that has not yet received the
configuration update, it may show stale information. However eventually
(in a matter of seconds if there are no network partitions) all the nodes
will agree about the set of nodes associated with a given master.
will agree about the set of nodes associated with a given primary.
14 changes: 7 additions & 7 deletions commands/cluster-replicate.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
The command reconfigures a node as a replica of the specified master.
If the node receiving the command is an *empty master*, as a side effect
of the command, the node role is changed from master to replica.
The command reconfigures a node as a replica of the specified primary.
If the node receiving the command is an *empty primary*, as a side effect
of the command, the node role is changed from primary to replica.

Once a node is turned into the replica of another master node, there is no need
Once a node is turned into the replica of another primary node, there is no need
to inform the other cluster nodes about the change: heartbeat packets exchanged
between nodes will propagate the new configuration automatically.

A replica will always accept the command, assuming that:

1. The specified node ID exists in its nodes table.
2. The specified node ID does not identify the instance we are sending the command to.
3. The specified node ID is a master.
3. The specified node ID is a primary.

If the node receiving the command is not already a replica, but is a master,
If the node receiving the command is not already a replica, but is a primary,
the command will only succeed, and the node will be converted into a replica,
only if the following additional conditions are met:

1. The node is not serving any hash slots.
2. The node is empty, no keys are stored at all in the key space.

If the command succeeds the new replica will immediately try to contact its master in order to replicate from it.
If the command succeeds the new replica will immediately try to contact its primary in order to replicate from it.
Loading