Skip to content

Commit ad75fcf

Browse files
committed
Address review comments from Tom
1 parent 660f2ac commit ad75fcf

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

06x-new-kafka-roller.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ The following are the configuration options for the new KafkaRoller. If exposed
102102
| maxRetries | 10 | No | The maximum number of times a node can be retried after not meeting the safety conditions e.g. availability check failed. This is checked against the node's `numRetries`. |
103103
| operationTimeoutMs | 60 seconds | Yes | The maximum amount of time we will wait for nodes to transition to `READY` state after an operation in each retry. This is already exposed to the user via environment variable `STRIMZI_OPERATION_TIMEOUT_MS`. |
104104
| maxRestartParallelism | 1 | Yes | The maximum number of broker nodes that can be restarted in parallel. This will be exposed to the user via the new environment variable `STRIMZI_MAX_RESTART_BATCH_SIZE`. However, if there are multiple brokers in `NOT_RUNNING` state, they may get restarted in parallel despite this configuration for a faster recovery.
105-
| postRestartDelay | 0 | Yes | Delay between restarts of nodes or batches. It's set to 0 by default, but can be adjusted by users to slow down the restarts. This will also help JIT to reach a steady state and to reduce impact on clients.
105+
| postRestartDelay | 0 | Yes | Delay between restarts of nodes or batches. It's set to 0 by default, but can be adjusted by users to slow down the restarts. This will also help Just-In-Time (JIT) compiler to reach a steady state and to reduce impact on clients.
106106
| restartAndPreferredLeaderElectionDelay | 10 seconds | No | Delay between restart and triggering partition leader election so that just-rolled broker is leading all the partitions it is the preferred leader for. This is to avoid situations where leaders moving to a newly started node that does not yet have established networking to some outside networks, e.g. through load balancers.
107107

108108
### Algorithm
@@ -129,6 +129,8 @@ The following are the configuration options for the new KafkaRoller. If exposed
129129
3. **Handle `NOT_READY` Nodes:**
130130
Wait for `NOT_READY` nodes to become `READY` within `operationTimeoutMs`.
131131

132+
This is to give an opportunity for a node to become ready in case it had just been restarted. If the node is still not ready after the timeout, it will fall through to the next step to determine the action to take on it.
133+
132134
4. **Categorize Nodes:**
133135
Group nodes based on their state and connectivity:
134136
- `RESTART_NOT_RUNNING`: Nodes in `NOT_READY` state.
@@ -138,9 +140,15 @@ The following are the configuration options for the new KafkaRoller. If exposed
138140
- `RESTART`: Nodes with reasons for restart and no previous restarts.
139141
- `NOP`: Nodes needing no operation.
140142

143+
Grouping the nodes into these categories makes it clearer to take actions on the them in the specific order. Also the category and node state is not always 1:1, for example, nodes might be unresponsive depsite having READY or NOT_READY state but need to be grouped together for sequential restarts. Grouping also makes it to easier to batch broker nodes for parallel restart.
144+
141145
5. **Wait for Log Recovery:**
142146
Wait for `WAIT_FOR_LOG_RECOVERY` nodes to become `READY` within `operationTimeoutMs`. If timeout is reached and `numRetries` exceeds `maxRetries`, throw `UnrestartableNodesException`. Otherwise, increment `numRetries` and repeat from step 2.
143147

148+
A Kafka broker node can take a long time to become ready while performing log recovery and it's not easy to determine how long it might take. Therefore, it's important to avoid restarting the node during this process, as doing so would restart the entire log recovery, potentially causing the node to enter a loop of continuous restarts without becoming ready. Moreover, while a broker node is in recovery, no other node should be restarted, as this could impact cluster availability and affect the client.
149+
150+
We do not wait for the broker to rejoin the ISR after it becomes ready because the roller's responsibility is to restart the nodes safely, not to manage inter-broker replication. Additionally, we cannot guarantee that the broker will always be able to catch up within a reasonable time frame.
151+
144152
6. **Restart `RESTART_NOT_RUNNING` Nodes:**
145153
Restart nodes in `NOT_RUNNING` state, considering special conditions:
146154
- If all controller nodes are `NOT_RUNNING`, restart them in parallel to form a quorum.

0 commit comments

Comments
 (0)