mongodb · NoahStapp · Dec 8, 2025 · Dec 8, 2025 · Dec 8, 2025 · Dec 8, 2025
@@ -207,8 +207,8 @@ capture this original retryable error. Drivers should then proceed with selectin
 
 ###### 3a. Selecting the server for retry
 
-In a sharded cluster, the server on which the operation failed MUST be provided to the server selection mechanism as a
-deprioritized server.
+The server address on which the operation failed MUST be provided to the server selection mechanism as a member of the
+deprioritized server address list.
 
 If the driver cannot select a server for a retry attempt or the newly selected server does not support retryable reads,
 retrying is not possible and drivers MUST raise the previous retryable error. In both cases, the caller is able to infer
@@ -284,6 +284,7 @@ function executeRetryableRead(command, session) {
   Exception previousError = null;
   retrying = false;
   Server previousServer = null;
+  deprioritizedServers = [];
   while true {
     if (previousError != null) {
       retrying = true;
@@ -292,9 +293,9 @@ function executeRetryableRead(command, session) {
       if (previousServer == null) {
         server = selectServer();
       } else {
-        // If a previous attempt was made, deprioritize the previous server
+        // If a previous attempt was made, deprioritize the previous server address
         // where the command failed.
-        deprioritizedServers = [ previousServer ];
+        deprioritizedServers.push(previousServer.address);
         server = selectServer(deprioritizedServers);
       }
     } catch (ServerSelectionException exception) {
@@ -547,6 +548,8 @@ any customers experiencing degraded performance can simply disable `retryableRea
 
 ## Changelog
 
+- 2026-12-08: Clarified that server deprioritization during retries must use a list of server addresses.
+
 - 2024-04-30: Migrated from reStructuredText to Markdown.
 
 - 2023-12-05: Add that any server information associated with retryable exceptions MUST reflect the originating server,

@@ -317,8 +317,8 @@ Drivers MUST then retry the operation as many times as necessary until any one o
 
 - CSOT is not enabled and one retry was attempted.
 
-For each retry attempt, drivers MUST select a writable server. In a sharded cluster, the server on which the operation
-failed MUST be provided to the server selection mechanism as a deprioritized server.
+For each retry attempt, drivers MUST select a writable server. The server address on which the operation failed MUST be
+provided to the server selection mechanism as a member of the deprioritized server address list.
 
 If the driver cannot select a server for a retry attempt or the selected server does not support retryable writes,
 retrying is not possible and drivers MUST raise the retryable error from the previous attempt. In both cases, the caller
@@ -377,6 +377,7 @@ function executeRetryableWrite(command, session) {
 
   Exception previousError = null;
   retrying = false;
+  deprioritizedServers = [];
   while true {
     try {
       return executeCommand(server, retryableCommand);
@@ -418,13 +419,13 @@ function executeRetryableWrite(command, session) {
     }
 
     /*
-     * We try to select server that is not the one that failed by passing the
-     * failed server as a deprioritized server.
+     * We try to select a server that has not already failed by adding the
+     * failed server to the list of deprioritized servers passed to selectServer.
      * If we cannot select a writable server, do not proceed with retrying and
      * throw the previous error. The caller can then infer that an attempt was
      * made and failed. */
     try {
-      deprioritizedServers = [ server ];
+      deprioritizedServers.push(server.address);
       server = selectServer("writable", deprioritizedServers);
     } catch (Exception ignoredError) {
       throw previousError;
@@ -680,6 +681,8 @@ retryWrites is not true would be inconsistent with the server and potentially co
 
 ## Changelog
 
+- 2026-12-08: Clarified that server deprioritization during retries must use a list of server addresses.
+
 - 2024-05-08: Add guidance for client-level `bulkWrite()` retryability.
 
 - 2024-05-02: Migrated from reStructuredText to Markdown.

@@ -40,15 +40,19 @@ The following test cases can be found in YAML form in the "tests" directory. Eac
 representing a set of servers, a ReadPreference document, and sets of servers returned at various stages of the server
 selection process. These sets are described below. Note that it is not required to test for correctness at every step.
 
-| Test Case           | Description                                                                                                                                        |
-| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `suitable_servers`  | the set of servers matching all server selection logic.                                                                                            |
-| `in_latency_window` | the subset of `suitable_servers` that falls within the allowable latency window (required). NOTE: tests use the default localThresholdMS of 15 ms. |
+| Test Case               | Description                                                                                                                                        |
+| ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `suitable_servers`      | the set of servers matching all server selection logic.                                                                                            |
+| `in_latency_window`     | the subset of `suitable_servers` that falls within the allowable latency window (required). NOTE: tests use the default localThresholdMS of 15 ms. |
+| `deprioritized_servers` | the set of servers that are deprioritized and must only be selected if no other suitable server exists.                                            |
 
 Drivers implementing server selection MUST test that their implementations correctly return **one** of the servers in
 `in_latency_window`. Drivers SHOULD test against the full set of servers in `in_latency_window` and against
 `suitable_servers` if possible.
 
+For tests containing `deprioritized_servers`, drivers MUST pass the given list of deprioritized servers to each server
+selection call.
+
 ### Topology Type Single
 
 - The single server is always selected.

@@ -708,9 +708,10 @@ For multi-threaded clients, the server selection algorithm is as follows:
     ["Server selection started" message](#server-selection-started-message).
 2. If the topology wire version is invalid, raise an error and log a
     ["Server selection failed" message](#server-selection-failed-message).
-3. Find suitable servers by topology type and operation type. If a list of deprioritized servers is provided, and the
-    topology is a sharded cluster, these servers should be selected only if there are no other suitable servers. The
-    server selection algorithm MUST ignore the deprioritized servers if the topology is not a sharded cluster.
+3. Find suitable servers as follows:
+    - Filter out any deprioritized server addresses.
+    - Find suitable servers from the filtered list by topology type and operation type.
+    - If there are no suitable servers, perform the previous step again without filtering out deprioritized servers.
 4. Filter the suitable servers by calling the optional, application-provided server selector.
 5. If there are any suitable servers, filter them according to
     [Filtering suitable servers based on the latency window](#filtering-suitable-servers-based-on-the-latency-window)
@@ -756,9 +757,10 @@ Therefore, for single-threaded clients, the server selection algorithm is as fol
         longer stale)
 5. If the topology wire version is invalid, raise an error and log a
     ["Server selection failed" message](#server-selection-failed-message).
-6. Find suitable servers by topology type and operation type. If a list of deprioritized servers is provided, and the
-    topology is a sharded cluster, these servers should be selected only if there are no other suitable servers. The
-    server selection algorithm MUST ignore the deprioritized servers if the topology is not a sharded cluster.
+6. Find suitable servers as follows:
+    - Filter out any deprioritized server addresses.
+    - Find suitable servers from the filtered list by topology type and operation type.
+    - If there are no suitable servers, perform the previous step again without filtering out deprioritized servers.
 7. Filter the suitable servers by calling the optional, application-provided server selector.
 8. If there are any suitable servers, filter them according to
     [Filtering suitable servers based on the latency window](#filtering-suitable-servers-based-on-the-latency-window)
@@ -846,10 +848,12 @@ details on each step, and
 [why is maxStalenessSeconds applied before tag_sets?](#why-is-maxstalenessseconds-applied-before-tag_sets).)
 
 If `mode` is 'secondaryPreferred', attempt the selection algorithm with `mode` 'secondary' and the user's
-`maxStalenessSeconds` and `tag_sets`. If no server matches, select the primary.
+`maxStalenessSeconds` and `tag_sets`. If no server matches, select the primary. Note that if all secondaries are
+deprioritized, the primary MUST be selected if it is available.
 
 If `mode` is 'primaryPreferred', select the primary if it is known, otherwise attempt the selection algorithm with
-`mode` 'secondary' and the user's `maxStalenessSeconds` and `tag_sets`.
+`mode` 'secondary' and the user's `maxStalenessSeconds` and `tag_sets`. Note that if the primary is deprioritized, a
+secondary MUST be selected if one is available.
 
 For all read preferences modes except 'primary', clients MUST set the `SecondaryOk` wire protocol flag (OP_QUERY) or
 `$readPreference` global command argument (OP_MSG) to ensure that any suitable server can handle the request. If the
@@ -1605,6 +1609,16 @@ filter it out because it is too stale, and be left with no eligible servers.
 The user's intent in specifying two tag sets was to fall back to the second set if needed, so we filter by
 maxStalenessSeconds first, then tag_sets, and select Node 2.
 
+### Why does server deprioritization use only server addresses and not ServerDescription objects?
+
+A server's address is the minimum identifying attribute that stays constant for across topology changes. Drivers create
+new ServerDescription objects on each topology change, and since ServerDescription objects check multiple attributes to
+determine equality comparisons, a deprioritized server could become non-equal to itself after a change and therefore
+incorrectly be considered suitable for a retry operation.
+
+By using addresses, we ensure that once a server is marked as deprioritized by an operation, it cannot be used again for
+a retry on that operation unless there are no other suitable servers.
+
 ## References
 
 - [Server Discovery and Monitoring](../server-discovery-and-monitoring/server-discovery-and-monitoring.md) specification
@@ -1614,6 +1628,9 @@ maxStalenessSeconds first, then tag_sets, and select Node 2.
 
 ## Changelog
 
+- 2025-12-08: Require server deprioritization for all topology types and clarify the order of server candidate
+    filtering.
+
 - 2015-06-26: Updated single-threaded selection logic with "stale" and serverSelectionTryOnce.
 
 - 2015-08-10: Updated single-threaded selection logic to ensure a scan always happens at least once under

@@ -0,0 +1,26 @@
+topology_description:
+  type: ReplicaSetNoPrimary
+  servers:
+  - &1
+    address: b:27017
+    avg_rtt_ms: 5
+    type: RSSecondary
+    tags:
+      data_center: nyc
+  - &2
+    address: c:27017
+    avg_rtt_ms: 100
+    type: RSSecondary
+    tags:
+      data_center: nyc
+operation: read
+read_preference:
+  mode: Nearest
+  tag_sets:
+  - data_center: nyc
+deprioritized_servers:
+- *1
+suitable_servers:
+- *2
+in_latency_window:
+- *2
@@ -0,0 +1,22 @@
+topology_description:
+  type: ReplicaSetNoPrimary
+  servers:
+  - &1
+    address: b:27017
+    avg_rtt_ms: 5
+    type: RSSecondary
+    tags:
+      data_center: nyc
+  - &2
+    address: c:27017
+    avg_rtt_ms: 100
+    type: RSSecondary
+    tags:
+      data_center: nyc
+operation: read
+read_preference:
+  mode: Primary
+deprioritized_servers:
+- *1
+suitable_servers: []
+in_latency_window: []