-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28840][SQL] conf.getClassLoader in SparkSQLCLIDriver should be avoided as it returns the UDFClassLoader which is created by Hive #25542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a standard usage? We support --jars one.jar:
build/sbt clean package -Phive -Phadoop-2.7 -Phive-thriftserver
export SPARK_PREPEND_CLASSES=true
bin/spark-sql --jars /root/.ivy2/cache/org.spark-project.hive/hive-contrib/jars/hive-contrib-1.2.1.spark2.jar -e "CREATE TEMPORARY FUNCTION example_max AS 'org.apache.hadoop.hive.contrib.udaf.example.UDAFExampleMax'"
Yes we can use one or multiple jars. Sorry I didn't get you, why you gave the above command ? |
|
Our example always uses http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications |
|
I got it. even if you use like below also there is a problem |
|
ok to test |
|
Retest this please. |
|
Test build #109510 has finished for PR 25542 at commit
|
|
Test build #109509 has finished for PR 25542 at commit
|
|
It seems build/sbt clean package -Phive -Phadoop-2.7 -Phive-thriftserver
export SPARK_PREPEND_CLASSES=true
bin/spark-sql --jars sql/hive/src/test/resources/SPARK-21101-1.0.jar |
|
please use hadoop-3.2 as it comes with Hive 2.3.5 |
|
Yes. I can reproduce this issue. Please add |
|
retest this please |
|
Hi, guys. I already triggered both profiles. |
|
Test build #109547 has finished for PR 25542 at commit
|
|
This issue caused by HIVE-11878. |
| // components. | ||
| // See also: code in ExecDriver.java | ||
| var loader = conf.getClassLoader | ||
| var loader = orginalClassLoader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a correct fix. Another approach is add the Spark jars to Utilities.addToClassPath to make UDFClassLoader work:
val sparkJars = sparkConf.get(org.apache.spark.internal.config.JARS)
if (sparkJars.nonEmpty || StringUtils.isNotBlank(auxJars)) {
loader = Utilities.addToClassPath(loader, sparkJars.toArray ++ StringUtils.split(auxJars, ","))
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| var loader = conf.getClassLoader | ||
| var loader = orginalClassLoader | ||
| val auxJars = HiveConf.getVar(conf, HiveConf.ConfVars.HIVEAUXJARS) | ||
| if (StringUtils.isNotBlank(auxJars)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add another test case to cover Utilities.addToClassPath(loader, StringUtils.split(auxJars, ",")):
spark-sql --jars one.jar --conf spark.hadoop.hive.aux.jars.path=two.jar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see already test case is present for this test("Support hive.aux.jars.path") . Do you want me to add one more combination with --jars and --conf both ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see already test case is present for this
test("Support hive.aux.jars.path"). Do you want me to add one more combination with --jars and --conf both ?
is the test case written for -- conf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
| cliConf.set(k, v) | ||
| } | ||
|
|
||
| val orginalClassLoader = Thread.currentThread().getContextClassLoader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: orginal -> original
330bda1 to
82795ad
Compare
|
Test build #109579 has finished for PR 25542 at commit
|
|
Test build #109581 has finished for PR 25542 at commit
|
|
Retest this please. |
|
Test build #109669 has finished for PR 25542 at commit
|
|
Test build #110422 has finished for PR 25542 at commit
|
533eeeb to
8eb6310
Compare
|
Test build #110427 has finished for PR 25542 at commit
|
8eb6310 to
cf13bfa
Compare
|
Test build #110430 has finished for PR 25542 at commit
|
cf13bfa to
47f8632
Compare
|
Test build #110431 has finished for PR 25542 at commit
|
|
Test build #110455 has finished for PR 25542 at commit
|
|
retest this please |
|
Test build #110462 has finished for PR 25542 at commit
|
|
Merged to master. |
… avoided as it returns the UDFClassLoader which is created by Hive ### What changes were proposed in this pull request? Spark loads the jars to custom class loader which is returned by `getSubmitClassLoader` . [Spark code](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L337) **In 1.2.1.spark2 version of Hive** `HiveConf.getClassLoader` returns same the class loader which is set by the spark **In Hive 2.3.5** `HiveConf.getClassLoader` returns the UDFClassLoader which is created by Hive. Because of this spark cannot find the jars as class loader got changed [Hive code](https://github.com/apache/hive/blob/rel/release-2.3.5/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L395) ### Why are the changes needed? Before creating `CliSessionState` object save the current class loader object in some reference. After SessionState.start() reset back class Loader to the one which saved earlier. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added Test case and also Manually tested **Before Fix**  **After Fix**  Closes apache#25542 from sandeep-katta/jarIssue. Lead-authored-by: sandeep katta <sandeep.katta2007@gmail.com> Co-authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Yuming Wang <wgyumg@gmail.com>
…ing SessionState for built-in Hive 2.3 ### What changes were proposed in this pull request? Hive 2.3 will set a new UDFClassLoader to hiveConf.classLoader when initializing SessionState since HIVE-11878, and 1. ADDJarCommand will add jars to clientLoader.classLoader. 2. --jar passed jar will be added to clientLoader.classLoader 3. jar passed by hive conf `hive.aux.jars` [SPARK-28954](#25653) [SPARK-28840](#25542) will be added to clientLoader.classLoader too For these reason we cannot load the jars added by ADDJarCommand because of class loader got changed. We reset it to clientLoader.ClassLoader here. ### Why are the changes needed? support for jdk11 ### Does this PR introduce any user-facing change? NO ### How was this patch tested? UT ``` export JAVA_HOME=/usr/lib/jdk-11.0.3 export PATH=$JAVA_HOME/bin:$PATH build/sbt -Phive-thriftserver -Phadoop-3.2 hive/test-only *HiveSparkSubmitSuite -- -z "SPARK-8368: includes jars passed in through --jars" hive-thriftserver/test-only *HiveThriftBinaryServerSuite -- -z "test add jar" ``` Closes #25775 from AngersZhuuuu/SPARK-29015-STS-JDK11. Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
What changes were proposed in this pull request?
Spark loads the jars to custom class loader which is returned by
getSubmitClassLoader.Spark code
In 1.2.1.spark2 version of Hive
HiveConf.getClassLoaderreturns same the class loader which is set by the sparkIn Hive 2.3.5
HiveConf.getClassLoaderreturns the UDFClassLoader which is created by Hive. Because of this spark cannot find the jars as class loader got changedHive code
Why are the changes needed?
Before creating
CliSessionStateobject save the current class loader object in some reference.After SessionState.start() reset back class Loader to the one which saved earlier.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added Test case and also Manually tested
Before Fix

After Fix
