[SPARK-3741] Make ConnectionManager propagate errors properly and add mo...#2593
[SPARK-3741] Make ConnectionManager propagate errors properly and add mo...#2593zsxwing wants to merge 3 commits intoapache:masterfrom zsxwing:SPARK-3741
Conversation
… more logs to avoid Executors swallowing errors
|
QA tests have started for PR 2593 at commit
|
|
QA tests have finished for PR 2593 at commit
|
|
Test FAILed. |
|
retest this please. This test is OK in my machine. |
|
QA tests have started for PR 2593 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Could you change this to be more descriptive? How about something like "Ignored error in onExceptionCallback"?
|
Hi @zsxwing, Thanks for submitting this PR (and sorry for the delayed review)! These changes will be very helpful in debugging certain types of connection manager issues that we've encountered. I like the careful attention to error-handling cases that we missed before, especially the use of I left a few comments, mostly regarding naming. If you fix up the merge conflicts, I think this looks ready to merge. Thanks! |
Conflicts: core/src/main/scala/org/apache/spark/network/nio/Connection.scala core/src/main/scala/org/apache/spark/network/nio/ConnectionManager.scala
There was a problem hiding this comment.
I added NonFatal(t) to avoid to output fatal exceptions. It's expected that they are not be handled.
|
Merged and updated the naming. |
|
QA tests have started for PR 2593 at commit
|
|
QA tests have finished for PR 2593 at commit
|
|
Test PASSed. |
|
This looks good to me, so I'm going to merge it. Thanks a bunch; this will really help with debugging! |
Sorry. I found that I forgot to add `afterExecute` for `handleConnectExecutor` in #2593. Author: zsxwing <zsxwing@gmail.com> Closes #2794 from zsxwing/SPARK-3741 and squashes the following commits: a0bc4dd [zsxwing] Add afterExecute for handleConnectExecutor
...re logs to avoid Executors swallowing errors
This PR made the following changes:
Connectionso that the error will be propagated properly.Promisedoesn't allow to call success/failure more than once.