[SPARK-6747][SQL] Support List<> as a return type in Hive UDF#5395
[SPARK-6747][SQL] Support List<> as a return type in Hive UDF#5395maropu wants to merge 5 commits intoapache:masterfrom
Conversation
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Add a blank line at the end of file.
|
@maropu my concern is does Hive support the UDF which return type is |
|
ok to test |
|
Test build #29807 has started for PR 5395 at commit |
|
Test build #29807 has finished for PR 5395 at commit
|
|
Test PASSed. |
|
Test build #29825 has started for PR 5395 at commit |
|
Ok, I will look into the implementation and the documentation of Hive for that. |
|
Test build #29825 has finished for PR 5395 at commit
|
|
Test PASSed. |
|
ISTM hive supports list<> as a return type (see the links below). https://github.com/kyluka/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBridge.java#L163 |
There was a problem hiding this comment.
This should just be an AnalysisException.
There was a problem hiding this comment.
Also prefer string interpolation to +, s"Unknown UDF input type $c"
There was a problem hiding this comment.
s"Unsupported java type $c" seems to be better in this error message because this method is not only designed for UDF.
|
Thanks for researching this. Can you address the final comments about avoiding the creation of a new type? |
02b3a91 to
3a8d952
Compare
|
Test build #30253 has started for PR 5395 at commit |
|
Sorry for the delay. Fixed and plz re-check them. |
|
Test build #30253 has finished for PR 5395 at commit
|
|
Test FAILed. |
|
Test build #30265 has started for PR 5395 at commit |
|
Test build #30265 has finished for PR 5395 at commit
|
|
Test PASSed. |
|
This is still creating a new type. Can we use |
8e333c7 to
ee56a0a
Compare
|
Missed and fixed. This fix satisfies your point? |
|
Test build #30445 has started for PR 5395 at commit |
|
Yes, LGTM |
|
Test build #30445 has finished for PR 5395 at commit
|
|
Test FAILed. |
|
cc @marmbrus Could you merge into master? I'll make a PR of SPARK-6912, but it depends on this. |
|
Can one of the admins verify this patch? |
|
cc @marmbrus just a reminder |
|
The last patch failed tests, no? |
|
ok to test |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #32142 has started for PR 5395 at commit |
|
Test build #32142 has finished for PR 5395 at commit
|
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
|
Oh, sorry. I'll fix it. |
|
@marmbrus Made a mistake to close this pr, so may I make a new pr because I can't re-open it. |
PRs Merged 1. [Internal] Add AppleAwsClientFactory for Mascot (apache#577) 2. Hive: Log new metadata location in commit (apache#4681) 3. change timeout to 120 for now (apache#661) 4. Internal: Add hive_catalog parameter to SparkCatalog (apache#670) 5. Internal: Pull catalog setting to CachedClientPool (apache#673) 6. Core: Defer reading Avro metadata until ManifestFile is read (apache#5206) 7. API: Fix ID assignment in schema merging (apache#5395) 8. AWS: S3OutputStream - failure to close should persist on subsequent close calls (apache#5311) 9. API: Allow schema updates to find fields with case-insensitivity (apache#5440) 10. Spark 3.3: Spark mergeSchema to respect Spark Case Sensitivity Configuration (apache#5441)
This patch supports List<> as a return type in Hive UDF.
We assume an UDF below;
public class UDFToListString extends UDF {
public List evaluate(Object o)
{ return Arrays.asList("xxx", "yyy", "zzz"); }
}
An exception of scala.MatchError is thrown as follows when the UDF used in the current implementation.
scala.MatchError: interface java.util.List (of class java.lang.Class)
at org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:174)
at org.apache.spark.sql.hive.HiveSimpleUdf.javaClassToDataType(hiveUdfs.scala:76)
at org.apache.spark.sql.hive.HiveSimpleUdf.dataType$lzycompute(hiveUdfs.scala:106)
at org.apache.spark.sql.hive.HiveSimpleUdf.dataType(hiveUdfs.scala:106)
at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:131)
at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:95)
at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:94)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
...