Skip to content

Commit a7a0b5c

Browse files
Ken TakagiwaKen Takagiwa
Ken Takagiwa
authored and
Ken Takagiwa
committed
add coment for hack why PYSPARK_PYTHON is needed in spark-submit
1 parent 72bfc66 commit a7a0b5c

File tree

1 file changed

+10
-0
lines changed

1 file changed

+10
-0
lines changed

bin/spark-submit

+10
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,16 @@ done
3737

3838
DEPLOY_MODE=${DEPLOY_MODE:-"client"}
3939

40+
41+
# This is a hack to make DStream.pyprint work.
42+
# This will be removed after pyprint is moved to PythonDStream.
43+
# Problem is that print function is in (Scala)DStream.
44+
# Whenever python code is executed, we call PythonDStream which passes
45+
# pythonExec(which python Spark should execute).
46+
# Since pyprint is located in DStream, Spark does not know which python should use.
47+
# In that case, get python path from PYSPARK_PYTHON, environmental variable.
48+
# This fix is ongoing in print branch in my repo.
49+
4050
# Figure out which Python executable to use
4151
if [[ -z "$PYSPARK_PYTHON" ]]; then
4252
PYSPARK_PYTHON="python"

0 commit comments

Comments
 (0)