Skip to content
11 changes: 7 additions & 4 deletions source/retryable-reads/retryable-reads.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,8 +207,8 @@ capture this original retryable error. Drivers should then proceed with selectin

###### 3a. Selecting the server for retry

In a sharded cluster, the server on which the operation failed MUST be provided to the server selection mechanism as a
deprioritized server.
The server address on which the operation failed MUST be provided to the server selection mechanism as a member of the
deprioritized server address list.

If the driver cannot select a server for a retry attempt or the newly selected server does not support retryable reads,
retrying is not possible and drivers MUST raise the previous retryable error. In both cases, the caller is able to infer
Expand Down Expand Up @@ -284,6 +284,7 @@ function executeRetryableRead(command, session) {
Exception previousError = null;
retrying = false;
Server previousServer = null;
deprioritizedServers = [];
while true {
if (previousError != null) {
retrying = true;
Expand All @@ -292,9 +293,9 @@ function executeRetryableRead(command, session) {
if (previousServer == null) {
server = selectServer();
} else {
// If a previous attempt was made, deprioritize the previous server
// If a previous attempt was made, deprioritize the previous server address
// where the command failed.
deprioritizedServers = [ previousServer ];
deprioritizedServers.push(previousServer.address);
server = selectServer(deprioritizedServers);
}
} catch (ServerSelectionException exception) {
Expand Down Expand Up @@ -547,6 +548,8 @@ any customers experiencing degraded performance can simply disable `retryableRea

## Changelog

- 2026-12-08: Clarified that server deprioritization during retries must use a list of server addresses.

- 2024-04-30: Migrated from reStructuredText to Markdown.

- 2023-12-05: Add that any server information associated with retryable exceptions MUST reflect the originating server,
Expand Down
13 changes: 8 additions & 5 deletions source/retryable-writes/retryable-writes.md
Original file line number Diff line number Diff line change
Expand Up @@ -317,8 +317,8 @@ Drivers MUST then retry the operation as many times as necessary until any one o

- CSOT is not enabled and one retry was attempted.

For each retry attempt, drivers MUST select a writable server. In a sharded cluster, the server on which the operation
failed MUST be provided to the server selection mechanism as a deprioritized server.
For each retry attempt, drivers MUST select a writable server. The server address on which the operation failed MUST be
provided to the server selection mechanism as a member of the deprioritized server address list.

If the driver cannot select a server for a retry attempt or the selected server does not support retryable writes,
retrying is not possible and drivers MUST raise the retryable error from the previous attempt. In both cases, the caller
Expand Down Expand Up @@ -377,6 +377,7 @@ function executeRetryableWrite(command, session) {

Exception previousError = null;
retrying = false;
deprioritizedServers = [];
while true {
try {
return executeCommand(server, retryableCommand);
Expand Down Expand Up @@ -418,13 +419,13 @@ function executeRetryableWrite(command, session) {
}

/*
* We try to select server that is not the one that failed by passing the
* failed server as a deprioritized server.
* We try to select a server that has not already failed by adding the
* failed server to the list of deprioritized servers passed to selectServer.
* If we cannot select a writable server, do not proceed with retrying and
* throw the previous error. The caller can then infer that an attempt was
* made and failed. */
try {
deprioritizedServers = [ server ];
deprioritizedServers.push(server.address);
server = selectServer("writable", deprioritizedServers);
} catch (Exception ignoredError) {
throw previousError;
Expand Down Expand Up @@ -680,6 +681,8 @@ retryWrites is not true would be inconsistent with the server and potentially co

## Changelog

- 2026-12-08: Clarified that server deprioritization during retries must use a list of server addresses.

- 2024-05-08: Add guidance for client-level `bulkWrite()` retryability.

- 2024-05-02: Migrated from reStructuredText to Markdown.
Expand Down
12 changes: 8 additions & 4 deletions source/server-selection/server-selection-tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,15 +40,19 @@ The following test cases can be found in YAML form in the "tests" directory. Eac
representing a set of servers, a ReadPreference document, and sets of servers returned at various stages of the server
selection process. These sets are described below. Note that it is not required to test for correctness at every step.

| Test Case | Description |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `suitable_servers` | the set of servers matching all server selection logic. |
| `in_latency_window` | the subset of `suitable_servers` that falls within the allowable latency window (required). NOTE: tests use the default localThresholdMS of 15 ms. |
| Test Case | Description |
| ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `suitable_servers` | the set of servers matching all server selection logic. |
| `in_latency_window` | the subset of `suitable_servers` that falls within the allowable latency window (required). NOTE: tests use the default localThresholdMS of 15 ms. |
| `deprioritized_servers` | the set of servers that are deprioritized and must only be selected if no other suitable server exists. |

Drivers implementing server selection MUST test that their implementations correctly return **one** of the servers in
`in_latency_window`. Drivers SHOULD test against the full set of servers in `in_latency_window` and against
`suitable_servers` if possible.

For tests containing `deprioritized_servers`, drivers MUST pass the given list of deprioritized servers to each server
selection call.

### Topology Type Single

- The single server is always selected.
Expand Down
33 changes: 25 additions & 8 deletions source/server-selection/server-selection.md
Original file line number Diff line number Diff line change
Expand Up @@ -708,9 +708,10 @@ For multi-threaded clients, the server selection algorithm is as follows:
["Server selection started" message](#server-selection-started-message).
2. If the topology wire version is invalid, raise an error and log a
["Server selection failed" message](#server-selection-failed-message).
3. Find suitable servers by topology type and operation type. If a list of deprioritized servers is provided, and the
topology is a sharded cluster, these servers should be selected only if there are no other suitable servers. The
server selection algorithm MUST ignore the deprioritized servers if the topology is not a sharded cluster.
3. Find suitable servers as follows:
- Filter out any deprioritized server addresses.
- Find suitable servers from the filtered list by topology type and operation type.
- If there are no suitable servers, perform the previous step again without filtering out deprioritized servers.
4. Filter the suitable servers by calling the optional, application-provided server selector.
5. If there are any suitable servers, filter them according to
[Filtering suitable servers based on the latency window](#filtering-suitable-servers-based-on-the-latency-window)
Expand Down Expand Up @@ -756,9 +757,10 @@ Therefore, for single-threaded clients, the server selection algorithm is as fol
longer stale)
5. If the topology wire version is invalid, raise an error and log a
["Server selection failed" message](#server-selection-failed-message).
6. Find suitable servers by topology type and operation type. If a list of deprioritized servers is provided, and the
topology is a sharded cluster, these servers should be selected only if there are no other suitable servers. The
server selection algorithm MUST ignore the deprioritized servers if the topology is not a sharded cluster.
6. Find suitable servers as follows:
- Filter out any deprioritized server addresses.
- Find suitable servers from the filtered list by topology type and operation type.
- If there are no suitable servers, perform the previous step again without filtering out deprioritized servers.
7. Filter the suitable servers by calling the optional, application-provided server selector.
8. If there are any suitable servers, filter them according to
[Filtering suitable servers based on the latency window](#filtering-suitable-servers-based-on-the-latency-window)
Expand Down Expand Up @@ -846,10 +848,12 @@ details on each step, and
[why is maxStalenessSeconds applied before tag_sets?](#why-is-maxstalenessseconds-applied-before-tag_sets).)

If `mode` is 'secondaryPreferred', attempt the selection algorithm with `mode` 'secondary' and the user's
`maxStalenessSeconds` and `tag_sets`. If no server matches, select the primary.
`maxStalenessSeconds` and `tag_sets`. If no server matches, select the primary. Note that if all secondaries are
deprioritized, the primary MUST be selected if it is available.

If `mode` is 'primaryPreferred', select the primary if it is known, otherwise attempt the selection algorithm with
`mode` 'secondary' and the user's `maxStalenessSeconds` and `tag_sets`.
`mode` 'secondary' and the user's `maxStalenessSeconds` and `tag_sets`. Note that if the primary is deprioritized, a
secondary MUST be selected if one is available.

For all read preferences modes except 'primary', clients MUST set the `SecondaryOk` wire protocol flag (OP_QUERY) or
`$readPreference` global command argument (OP_MSG) to ensure that any suitable server can handle the request. If the
Expand Down Expand Up @@ -1605,6 +1609,16 @@ filter it out because it is too stale, and be left with no eligible servers.
The user's intent in specifying two tag sets was to fall back to the second set if needed, so we filter by
maxStalenessSeconds first, then tag_sets, and select Node 2.

### Why does server deprioritization use only server addresses and not ServerDescription objects?

A server's address is the minimum identifying attribute that stays constant for across topology changes. Drivers create
new ServerDescription objects on each topology change, and since ServerDescription objects check multiple attributes to
determine equality comparisons, a deprioritized server could become non-equal to itself after a change and therefore
incorrectly be considered suitable for a retry operation.

By using addresses, we ensure that once a server is marked as deprioritized by an operation, it cannot be used again for
a retry on that operation unless there are no other suitable servers.

## References

- [Server Discovery and Monitoring](../server-discovery-and-monitoring/server-discovery-and-monitoring.md) specification
Expand All @@ -1614,6 +1628,9 @@ maxStalenessSeconds first, then tag_sets, and select Node 2.

## Changelog

- 2025-12-08: Require server deprioritization for all topology types and clarify the order of server candidate
filtering.

- 2015-06-26: Updated single-threaded selection logic with "stale" and serverSelectionTryOnce.

- 2015-08-10: Updated single-threaded selection logic to ensure a scan always happens at least once under
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
topology_description:
type: ReplicaSetNoPrimary
servers:
- &1
address: b:27017
avg_rtt_ms: 5
type: RSSecondary
tags:
data_center: nyc
- &2
address: c:27017
avg_rtt_ms: 100
type: RSSecondary
tags:
data_center: nyc
operation: read
read_preference:
mode: Nearest
tag_sets:
- data_center: nyc
deprioritized_servers:
- *1
suitable_servers:
- *2
in_latency_window:
- *2

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
topology_description:
type: ReplicaSetNoPrimary
servers:
- &1
address: b:27017
avg_rtt_ms: 5
type: RSSecondary
tags:
data_center: nyc
- &2
address: c:27017
avg_rtt_ms: 100
type: RSSecondary
tags:
data_center: nyc
operation: read
read_preference:
mode: Primary
deprioritized_servers:
- *1
suitable_servers: []
in_latency_window: []

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading