[SPARK-43082][CONNECT][PYTHON] Arrow-optimized Python UDFs in Spark Connect#40725
[SPARK-43082][CONNECT][PYTHON] Arrow-optimized Python UDFs in Spark Connect#40725xinrong-meng wants to merge 13 commits intoapache:masterfrom
Conversation
python/pyspark/sql/udf.py
Outdated
There was a problem hiding this comment.
Ignoring the type annotations of _create_arrow_py_udf because it is shared between vanilla PySpark and Spark Connect Python Client.
There was a problem hiding this comment.
The function is only an extraction of original code L142 - L179 for code reuse.
python/pyspark/sql/connect/udf.py
Outdated
There was a problem hiding this comment.
There is duplicated code in _create_py_udf between Spark Connect Python Client and vanilla PySpark, except for fetching the active SparkSession.
However, for a clear code path separation and abstraction, I decided not to refactor it for now.
|
CI failed because of |
95cad25 to
f6fc6e1
Compare
|
@HyukjinKwon @zhengruifeng Would you please take a look? Thank you! |
|
cc @ueshin FYI |
| import pandas as pd | ||
| from pyspark.sql.pandas.functions import _create_pandas_udf | ||
|
|
||
| return_type = regular_udf.returnType |
There was a problem hiding this comment.
it seems that the regular_udf is only used to pass the returnType and evalType ?
There was a problem hiding this comment.
And regular_udf.func based on the updated code.
python/pyspark/sql/tests/connect/test_parity_arrow_python_udf.py
Outdated
Show resolved
Hide resolved
|
Merged to master. |
What changes were proposed in this pull request?
Implement Arrow-optimized Python UDFs in Spark Connect.
Please see #39384 for motivation and performance improvements of Arrow-optimized Python UDFs.
Why are the changes needed?
Parity with vanilla PySpark.
Does this PR introduce any user-facing change?
Yes. In Spark Connect Python Client, users can:
useArrowparameter True to enable Arrow optimization for a specific Python UDF.spark.sql.execution.pythonUDF.arrow.enabledSpark Conf to make all Python UDFs Arrow-optimized.How was this patch tested?
Parity unit tests.
SPARK-40307