-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-42051][SQL] Codegen Support for HiveGenericUDF #39555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @cloud-fan @HyukjinKwon @dongjoon-hyun PTAL, thanks |
@@ -128,11 +129,10 @@ private[hive] class DeferredObjectAdapter(oi: ObjectInspector, dataType: DataTyp | |||
override def get(): AnyRef = wrapper(func()).asInstanceOf[AnyRef] | |||
} | |||
|
|||
private[hive] case class HiveGenericUDF( | |||
case class HiveGenericUDF( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to rewrite HiveGenericUDF
with Invoke
? Then we can simply use RuntimeReplaceable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems quite complicated to handle the hive value mapping and function wrapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @sunchao , too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. The current approach also looks reasonable to me.
Could you review this once more when you have some time, @cloud-fan and @sunchao ?
Also cc @LuciferYang , too |
sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
Outdated
Show resolved
Hide resolved
sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
Outdated
Show resolved
Hide resolved
Belated Happy new year! |
Since the last commit is for adding a test, I will merge this. thanks @dongjoon-hyun @LuciferYang. Merged to master |
Thank you, @yaooqinn and @LuciferYang ! |
late LGTM |
@@ -120,19 +121,18 @@ private[hive] class DeferredObjectAdapter(oi: ObjectInspector, dataType: DataTyp | |||
extends DeferredObject with HiveInspectors { | |||
|
|||
private val wrapper = wrapperFor(oi, dataType) | |||
private var func: () => Any = _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yaooqinn @dongjoon-hyun This change removes deferred evaluation and means it is no longer possible to implement short-circuiting in Hive generic UDFs. I filed https://issues.apache.org/jira/browse/SPARK-44616 for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the upgrade from Spark 3.3.1 to 3.5.1, we encountered syntax issues with this pr. The problem arose from DeferredObject currently passing a value instead of a function, which prevented users from catching exceptions in GenericUDF, resulting in semantic differences.
Here is an example case we encountered. Originally, the semantics were that str_to_map_udf
would throw an exception due to issues with the input string, while merge_map_udf
could catch the exception and return a null value. However, currently, any exception encountered by str_to_map_udf
will cause the program to fail.
select merge_map_udf(str_to_map_udf(col1), parse_map_udf(col2), map("key", "value")) from table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yaooqinn is it easy to fix? If not we should probably revert it as this is not a critical perf improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being late,my network glitches a lot recently. and thanks for reporting this issue. It’s easy to make a fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@panbingkun thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changes were proposed in this pull request?
As a subtask of SPARK-42050, this PR adds Codegen Support for
HiveGenericUDF
Why are the changes needed?
improve codegen coverage and performance
Does this PR introduce any user-facing change?
no
How was this patch tested?
new UT added