[SPARK-42051][SQL] Codegen Support for HiveGenericUDF #39555

yaooqinn · 2023-01-13T11:02:41Z

What changes were proposed in this pull request?

As a subtask of SPARK-42050, this PR adds Codegen Support for HiveGenericUDF

Why are the changes needed?

improve codegen coverage and performance

Does this PR introduce any user-facing change?

no

How was this patch tested?

new UT added

yaooqinn · 2023-01-13T11:10:06Z

cc @cloud-fan @HyukjinKwon @dongjoon-hyun PTAL, thanks

cloud-fan · 2023-01-13T14:08:15Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala

@@ -128,11 +129,10 @@ private[hive] class DeferredObjectAdapter(oi: ObjectInspector, dataType: DataTyp
  override def get(): AnyRef = wrapper(func()).asInstanceOf[AnyRef]
 }

-private[hive] case class HiveGenericUDF(
+case class HiveGenericUDF(


is it possible to rewrite HiveGenericUDF with Invoke? Then we can simply use RuntimeReplaceable

Seems quite complicated to handle the hive value mapping and function wrapping

dongjoon-hyun

cc @sunchao , too

dongjoon-hyun

+1, LGTM. The current approach also looks reasonable to me.

Could you review this once more when you have some time, @cloud-fan and @sunchao ?

dongjoon-hyun · 2023-01-22T22:02:04Z

Also cc @LuciferYang , too

sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala

yaooqinn · 2023-01-31T07:17:14Z

Belated Happy new year!
Comments addressed, thank you all.

yaooqinn · 2023-02-01T01:56:33Z

Since the last commit is for adding a test, I will merge this. thanks @dongjoon-hyun @LuciferYang.

Merged to master

dongjoon-hyun · 2023-02-01T03:35:37Z

Thank you, @yaooqinn and @LuciferYang !

LuciferYang · 2023-02-01T14:12:56Z

late LGTM

andygrove · 2023-07-31T22:17:59Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala

@@ -120,19 +121,18 @@ private[hive] class DeferredObjectAdapter(oi: ObjectInspector, dataType: DataTyp
  extends DeferredObject with HiveInspectors {

  private val wrapper = wrapperFor(oi, dataType)
-  private var func: () => Any = _


@yaooqinn @dongjoon-hyun This change removes deferred evaluation and means it is no longer possible to implement short-circuiting in Hive generic UDFs. I filed https://issues.apache.org/jira/browse/SPARK-44616 for this.

During the upgrade from Spark 3.3.1 to 3.5.1, we encountered syntax issues with this pr. The problem arose from DeferredObject currently passing a value instead of a function, which prevented users from catching exceptions in GenericUDF, resulting in semantic differences.

Here is an example case we encountered. Originally, the semantics were that str_to_map_udf would throw an exception due to issues with the input string, while merge_map_udf could catch the exception and return a null value. However, currently, any exception encountered by str_to_map_udf will cause the program to fail.

select merge_map_udf(str_to_map_udf(col1), parse_map_udf(col2), map("key", "value")) from table

@yaooqinn is it easy to fix? If not we should probably revert it as this is not a critical perf improvement.

Sorry for being late，my network glitches a lot recently. and thanks for reporting this issue. It’s easy to make a fix

@yaooqinn, is this already underway? I tried this on local #47193

@panbingkun thank you

I tried to fix it in #47268 in another way, @yaooqinn would you please take a look?

[SPARK-42051][SQL] Codegen Support for HiveGenericUDF

3a54566

github-actions bot added the SQL label Jan 13, 2023

yaooqinn added 2 commits January 13, 2023 19:08

[SPARK-42051][SQL] Codegen Support for HiveGenericUDF

362197e

[SPARK-42051][SQL] Codegen Support for HiveGenericUDF

28a3c9c

yaooqinn self-assigned this Jan 13, 2023

cloud-fan reviewed Jan 13, 2023

View reviewed changes

yaooqinn added 5 commits January 16, 2023 11:05

fix 'Cannot access non-final local variable from inner class'

257d484

clean

cbf44cd

boxing

bb02b81

boxing

203ed91

update test

696e785

dongjoon-hyun reviewed Jan 20, 2023

View reviewed changes

dongjoon-hyun approved these changes Jan 22, 2023

View reviewed changes

LuciferYang reviewed Jan 23, 2023

View reviewed changes

addr comments

a11afe2

yaooqinn closed this in 34fb408 Feb 1, 2023

HyukjinKwon mentioned this pull request Feb 3, 2023

[SPARK-42052][SQL] Codegen Support for HiveSimpleUDF #39865

Closed

yaooqinn deleted the SPARK-42051 branch February 3, 2023 07:35

panbingkun mentioned this pull request Feb 9, 2023

[SPARK-42386][SQL] Rewrite HiveGenericUDF with Invoke #39949

Closed

andygrove mentioned this pull request Jul 31, 2023

[BUG][Databricks 12.2] GpuRowBasedHiveGenericUDF ClassCastException NVIDIA/spark-rapids#8318

Closed

andygrove reviewed Jul 31, 2023

View reviewed changes

[SPARK-42051][SQL] Codegen Support for HiveGenericUDF #39555

[SPARK-42051][SQL] Codegen Support for HiveGenericUDF #39555

Uh oh!

Conversation

yaooqinn commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

yaooqinn commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jan 22, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yaooqinn commented Jan 31, 2023

Uh oh!

yaooqinn commented Feb 1, 2023

Uh oh!

dongjoon-hyun commented Feb 1, 2023

Uh oh!

LuciferYang commented Feb 1, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yaooqinn commented Jan 13, 2023 •

edited

Loading

yaooqinn commented Jan 13, 2023 •

edited

Loading