You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 06x-new-kafka-roller.md
+9-1Lines changed: 9 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -102,7 +102,7 @@ The following are the configuration options for the new KafkaRoller. If exposed
102
102
| maxRetries | 10 | No | The maximum number of times a node can be retried after not meeting the safety conditions e.g. availability check failed. This is checked against the node's `numRetries`. |
103
103
| operationTimeoutMs | 60 seconds | Yes | The maximum amount of time we will wait for nodes to transition to `READY` state after an operation in each retry. This is already exposed to the user via environment variable `STRIMZI_OPERATION_TIMEOUT_MS`. |
104
104
| maxRestartParallelism | 1 | Yes | The maximum number of broker nodes that can be restarted in parallel. This will be exposed to the user via the new environment variable `STRIMZI_MAX_RESTART_BATCH_SIZE`. However, if there are multiple brokers in `NOT_RUNNING` state, they may get restarted in parallel despite this configuration for a faster recovery.
105
-
| postRestartDelay | 0 | Yes | Delay between restarts of nodes or batches. It's set to 0 by default, but can be adjusted by users to slow down the restarts. This will also help JIT to reach a steady state and to reduce impact on clients.
105
+
| postRestartDelay | 0 | Yes | Delay between restarts of nodes or batches. It's set to 0 by default, but can be adjusted by users to slow down the restarts. This will also help Just-In-Time (JIT) compiler to reach a steady state and to reduce impact on clients.
106
106
| restartAndPreferredLeaderElectionDelay | 10 seconds | No | Delay between restart and triggering partition leader election so that just-rolled broker is leading all the partitions it is the preferred leader for. This is to avoid situations where leaders moving to a newly started node that does not yet have established networking to some outside networks, e.g. through load balancers.
107
107
108
108
### Algorithm
@@ -129,6 +129,8 @@ The following are the configuration options for the new KafkaRoller. If exposed
129
129
3.**Handle `NOT_READY` Nodes:**
130
130
Wait for `NOT_READY` nodes to become `READY` within `operationTimeoutMs`.
131
131
132
+
This is to give an opportunity for a node to become ready in case it had just been restarted. If the node is still not ready after the timeout, it will fall through to the next step to determine the action to take on it.
133
+
132
134
4.**Categorize Nodes:**
133
135
Group nodes based on their state and connectivity:
134
136
-`RESTART_NOT_RUNNING`: Nodes in `NOT_READY` state.
@@ -138,9 +140,15 @@ The following are the configuration options for the new KafkaRoller. If exposed
138
140
-`RESTART`: Nodes with reasons for restart and no previous restarts.
139
141
-`NOP`: Nodes needing no operation.
140
142
143
+
Grouping the nodes into these categories makes it clearer to take actions on the them in the specific order. Also the category and node state is not always 1:1, for example, nodes might be unresponsive depsite having READY or NOT_READY state but need to be grouped together for sequential restarts. Grouping also makes it to easier to batch broker nodes for parallel restart.
144
+
141
145
5.**Wait for Log Recovery:**
142
146
Wait for `WAIT_FOR_LOG_RECOVERY` nodes to become `READY` within `operationTimeoutMs`. If timeout is reached and `numRetries` exceeds `maxRetries`, throw `UnrestartableNodesException`. Otherwise, increment `numRetries` and repeat from step 2.
143
147
148
+
A Kafka broker node can take a long time to become ready while performing log recovery and it's not easy to determine how long it might take. Therefore, it's important to avoid restarting the node during this process, as doing so would restart the entire log recovery, potentially causing the node to enter a loop of continuous restarts without becoming ready. Moreover, while a broker node is in recovery, no other node should be restarted, as this could impact cluster availability and affect the client.
149
+
150
+
We do not wait for the broker to rejoin the ISR after it becomes ready because the roller's responsibility is to restart the nodes safely, not to manage inter-broker replication. Additionally, we cannot guarantee that the broker will always be able to catch up within a reasonable time frame.
151
+
144
152
6.**Restart `RESTART_NOT_RUNNING` Nodes:**
145
153
Restart nodes in `NOT_RUNNING` state, considering special conditions:
146
154
- If all controller nodes are `NOT_RUNNING`, restart them in parallel to form a quorum.
0 commit comments