Skip to content

Commit 2e5cab3

Browse files
committed
SPARK-2282: Reuse PySpark Accumulator sockets to avoid crashing Spark
JIRA: https://issues.apache.org/jira/browse/SPARK-2282 This issue is caused by a buildup of sockets in the TIME_WAIT stage of TCP, which is a stage that lasts for some period of time after the communication closes. This solution simply allows us to reuse sockets that are in TIME_WAIT, to avoid issues with the buildup of the rapid creation of these sockets.
1 parent 924b708 commit 2e5cab3

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -599,6 +599,8 @@ private class PythonAccumulatorParam(@transient serverHost: String, serverPort:
599599
} else {
600600
// This happens on the master, where we pass the updates to Python through a socket
601601
val socket = new Socket(serverHost, serverPort)
602+
// SPARK-2282: Immediately reuse closed sockets because we create one per task.
603+
socket.setReuseAddress(true)
602604
val in = socket.getInputStream
603605
val out = new DataOutputStream(new BufferedOutputStream(socket.getOutputStream, bufferSize))
604606
out.writeInt(val2.size)

0 commit comments

Comments
 (0)