Skip to content

Commit 80bf4ce

Browse files
dongjoon-hyunmarmbrus
authored andcommitted
[MINOR][SQL][DOCS] Add notes of the deterministic assumption on UDF functions
## What changes were proposed in this pull request? Spark assumes that UDF functions are deterministic. This PR adds explicit notes about that. ## How was this patch tested? It's only about docs. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #13087 from dongjoon-hyun/SPARK-15282. (cherry picked from commit 37c617e) Signed-off-by: Michael Armbrust <michael@databricks.com>
1 parent c55a39c commit 80bf4ce

File tree

7 files changed

+15
-0
lines changed

7 files changed

+15
-0
lines changed

python/pyspark/sql/functions.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1756,6 +1756,9 @@ def __call__(self, *cols):
17561756
@since(1.3)
17571757
def udf(f, returnType=StringType()):
17581758
"""Creates a :class:`Column` expression representing a user defined function (UDF).
1759+
Note that the user-defined functions must be deterministic. Due to optimization,
1760+
duplicate invocations may be eliminated or the function may even be invoked more times than
1761+
it is present in the query.
17591762
17601763
>>> from pyspark.sql.types import IntegerType
17611764
>>> slen = udf(lambda s: len(s), IntegerType())

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ import org.apache.spark.sql.types.DataType
2323

2424
/**
2525
* User-defined function.
26+
* Note that the user-defined functions must be deterministic.
2627
* @param function The user defined scala function to run.
2728
* Note that if you use primitive parameters, you are not able to check if it is
2829
* null or not, and the UDF will return null for you if the primitive input is

sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -199,6 +199,9 @@ class SQLContext private[sql](
199199

200200
/**
201201
* A collection of methods for registering user-defined functions (UDF).
202+
* Note that the user-defined functions must be deterministic. Due to optimization,
203+
* duplicate invocations may be eliminated or the function may even be invoked more times than
204+
* it is present in the query.
202205
*
203206
* The following example registers a Scala closure as UDF:
204207
* {{{

sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,9 @@ class SparkSession private(
145145

146146
/**
147147
* A collection of methods for registering user-defined functions (UDF).
148+
* Note that the user-defined functions must be deterministic. Due to optimization,
149+
* duplicate invocations may be eliminated or the function may even be invoked more times than
150+
* it is present in the query.
148151
*
149152
* The following example registers a Scala closure as UDF:
150153
* {{{

sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ import org.apache.spark.sql.types.DataType
3232

3333
/**
3434
* Functions for registering user-defined functions. Use [[SQLContext.udf]] to access this.
35+
* Note that the user-defined functions must be deterministic.
3536
*
3637
* @since 1.3.0
3738
*/

sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@ import org.apache.spark.sql.types.DataType
2525

2626
/**
2727
* A user-defined function. To create one, use the `udf` functions in [[functions]].
28+
* Note that the user-defined functions must be deterministic. Due to optimization,
29+
* duplicate invocations may be eliminated or the function may even be invoked more times than
30+
* it is present in the query.
2831
* As an example:
2932
* {{{
3033
* // Defined a UDF that returns true or false based on some numeric score.

sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,7 @@ private[sql] class SessionState(sparkSession: SparkSession) {
100100

101101
/**
102102
* Interface exposed to the user for registering user-defined functions.
103+
* Note that the user-defined functions must be deterministic.
103104
*/
104105
lazy val udf: UDFRegistration = new UDFRegistration(functionRegistry)
105106

0 commit comments

Comments
 (0)