Commit b5b34af
committed
[SPARK-3736] Workers reconnect when disassociated from the master.
Before, if the master node is killed and restarted, the worker nodes
would not attempt to reconnect to the Master. Therefore, when the Master
node was restarted, the worker nodes needed to be restarted as well.
Now, when the Master node is disconnected, the worker nodes will
continuously ping the master node in attempts to reconnect to it. Once
the master node restarts, it will detect one of the registration
requests from its former workers. The result is that the cluster
re-enters a healthy state.
In addition, when the master does not receive a heartbeat from the
worker, the worker was removed; however, when the worker sent a
heartbeat to the master, the master used to ignore the heartbeat. Now,
a master that receives a heartbeat from a worker that had been
disconnected will request the worker to re-attempt the registration
process, at which point the worker will send a RegisterWorker request
and be re-connected accordingly.
Re-connection attempts per worker are submitted every N seconds, where N
is configured by the property spark.worker.reconnect.interval - this has
a default of 60 seconds right now.1 parent 293a0b5 commit b5b34af
File tree
3 files changed
+27
-1
lines changed- core/src/main/scala/org/apache/spark/deploy
- master
- worker
3 files changed
+27
-1
lines changedLines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
| 74 | + | |
| 75 | + | |
74 | 76 | | |
75 | 77 | | |
76 | 78 | | |
| |||
Lines changed: 5 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
341 | 341 | | |
342 | 342 | | |
343 | 343 | | |
344 | | - | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
345 | 349 | | |
346 | 350 | | |
347 | 351 | | |
| |||
Lines changed: 20 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
| 70 | + | |
| 71 | + | |
70 | 72 | | |
71 | 73 | | |
72 | 74 | | |
| |||
94 | 96 | | |
95 | 97 | | |
96 | 98 | | |
| 99 | + | |
97 | 100 | | |
98 | 101 | | |
99 | 102 | | |
| |||
197 | 200 | | |
198 | 201 | | |
199 | 202 | | |
| 203 | + | |
| 204 | + | |
200 | 205 | | |
201 | 206 | | |
202 | 207 | | |
| |||
243 | 248 | | |
244 | 249 | | |
245 | 250 | | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
246 | 255 | | |
247 | 256 | | |
248 | 257 | | |
| |||
365 | 374 | | |
366 | 375 | | |
367 | 376 | | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
368 | 387 | | |
369 | 388 | | |
370 | 389 | | |
| |||
374 | 393 | | |
375 | 394 | | |
376 | 395 | | |
| 396 | + | |
377 | 397 | | |
378 | 398 | | |
379 | 399 | | |
| |||
0 commit comments