[SPARK-6747] [SQL] Throw an AnalysisException when unsupported Java list types used in Hive UDF#7248
[SPARK-6747] [SQL] Throw an AnalysisException when unsupported Java list types used in Hive UDF#7248maropu wants to merge 19 commits intoapache:masterfrom
Conversation
|
@marmbrus Through the discussion of #5395, I think it is hard to support java List<> types in SparkSQL because of type erasure. ISTM that if udf developers use this type, they'd be better to use GenericUDF interfaces instead of UDF ones. So, I re-created a PR to throw a meaningful exception when this kind of types used. Any thought? |
There was a problem hiding this comment.
assign the result of this function to a variable and check that the message is correct.
There was a problem hiding this comment.
Fixed and Does it satisfy your comment?
|
ok to test |
|
This looks great! One minor comment on the tests. |
|
@marmbrus Ok and thanks. |
|
Test build #36628 has finished for PR 7248 at commit
|
|
Thanks! Merging to master. |
|
Test build #36629 has finished for PR 7248 at commit
|
…ap<K,V> types used in Hive UDF To make UDF developers understood, throw an exception when unsupported Map<K,V> types used in Hive UDF. This fix is the same with #7248. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #7257 from maropu/ThrowExceptionWhenMapUsed and squashes the following commits: 916099a [Takeshi YAMAMURO] Fix style errors 7886dcc [Takeshi YAMAMURO] Throw an exception when Map<> used in Hive UDF
The current implementation can't handle List<> as a return type in Hive UDF and
throws meaningless Match Error.
We assume an UDF below;
public class UDFToListString extends UDF {
public List evaluate(Object o)
{ return Arrays.asList("xxx", "yyy", "zzz"); }
}
An exception of scala.MatchError is thrown as follows when the UDF used;
scala.MatchError: interface java.util.List (of class java.lang.Class)
at org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:174)
at org.apache.spark.sql.hive.HiveSimpleUdf.javaClassToDataType(hiveUdfs.scala:76)
at org.apache.spark.sql.hive.HiveSimpleUdf.dataType$lzycompute(hiveUdfs.scala:106)
at org.apache.spark.sql.hive.HiveSimpleUdf.dataType(hiveUdfs.scala:106)
at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:131)
at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:95)
at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:94)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
...
To make udf developers more understood, we need to throw a more suitable exception.