You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SPARK-1104: kill Process in workerThread of ExecutorRunner
As reported in https://spark-project.atlassian.net/browse/SPARK-1104
By @pwendell: "Sometimes due to large shuffles executors will take a long time shutting down. In particular this can happen if large numbers of shuffle files are around (this will be alleviated by SPARK-1103, but nonetheless...).
The symptom is you have DEAD workers sitting around in the UI and the existing workers keep trying to re-register but can't because they've been assumed dead."
In this patch, I add lines in the handler of InterruptedException in workerThread of executorRunner, so that the process.destroy() and process.waitFor() can only block the workerThread instead of blocking the worker Actor...
---------
analysis: process.destroy() is a blocking method, i.e. it only returns when all shutdownHook threads return...so calling it in Worker thread will make Worker block for a long while....
about what will happen on the shutdown hooks when the JVM process is killed: http://www.tutorialspoint.com/java/lang/runtime_addshutdownhook.htm
Author: CodingCat <zhunansjtu@gmail.com>
Closes#35 from CodingCat/SPARK-1104 and squashes the following commits:
85767da [CodingCat] add null checking and remove unnecessary killProce
3107aeb [CodingCat] address Aaron's comments
eb615ba [CodingCat] kill the process when the error happens
0accf2f [CodingCat] set process to null after killed it
1d511c8 [CodingCat] kill Process in workerThread
(cherry picked from commit f99af85)
Signed-off-by: Aaron Davidson <aaron@databricks.com>
0 commit comments