Skip to content

Conversation

@sandeep-katta
Copy link
Contributor

@sandeep-katta sandeep-katta commented Aug 21, 2019

What changes were proposed in this pull request?

Spark loads the jars to custom class loader which is returned by getSubmitClassLoader .
Spark code

In 1.2.1.spark2 version of Hive

HiveConf.getClassLoader returns same the class loader which is set by the spark

In Hive 2.3.5
HiveConf.getClassLoader returns the UDFClassLoader which is created by Hive. Because of this spark cannot find the jars as class loader got changed
Hive code

Why are the changes needed?

Before creating CliSessionState object save the current class loader object in some reference.
After SessionState.start() reset back class Loader to the one which saved earlier.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added Test case and also Manually tested

Before Fix
b4Fix

After Fix
afterFix

@sandeep-katta sandeep-katta changed the title [SPARK-28840][SQL] conf.getClassLoader in SparkSQLCliDriver should be avoided as it returns the UDFClassLoader which is created by Hive [SPARK-28840][SQL] conf.getClassLoader in SparkSQLCLIDriver should be avoided as it returns the UDFClassLoader which is created by Hive Aug 21, 2019
Copy link
Member

@wangyum wangyum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a standard usage? We support --jars one.jar:

build/sbt clean package -Phive -Phadoop-2.7 -Phive-thriftserver
export SPARK_PREPEND_CLASSES=true
bin/spark-sql --jars /root/.ivy2/cache/org.spark-project.hive/hive-contrib/jars/hive-contrib-1.2.1.spark2.jar -e "CREATE TEMPORARY FUNCTION example_max AS 'org.apache.hadoop.hive.contrib.udaf.example.UDAFExampleMax'"

@sandeep-katta
Copy link
Contributor Author

Is it a standard usage? We support --jars one.jar:

build/sbt clean package -Phive -Phadoop-2.7 -Phive-thriftserver
export SPARK_PREPEND_CLASSES=true
bin/spark-sql --jars /root/.ivy2/cache/org.spark-project.hive/hive-contrib/jars/hive-contrib-1.2.1.spark2.jar -e "CREATE TEMPORARY FUNCTION example_max AS 'org.apache.hadoop.hive.contrib.udaf.example.UDAFExampleMax'"

Yes we can use one or multiple jars. Sorry I didn't get you, why you gave the above command ?

@sandeep-katta
Copy link
Contributor Author

I got it. even if you use like below also there is a problem
./bin/spark-sql --master local --jars /opt/BigdataTools/summation.jar

@dongjoon-hyun
Copy link
Member

ok to test

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28840][SQL] conf.getClassLoader in SparkSQLCLIDriver should be avoided as it returns the UDFClassLoader which is created by Hive [SPARK-28840][SQL][test-hadoop3.2] conf.getClassLoader in SparkSQLCLIDriver should be avoided as it returns the UDFClassLoader which is created by Hive Aug 21, 2019
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28840][SQL][test-hadoop3.2] conf.getClassLoader in SparkSQLCLIDriver should be avoided as it returns the UDFClassLoader which is created by Hive [SPARK-28840][SQL] conf.getClassLoader in SparkSQLCLIDriver should be avoided as it returns the UDFClassLoader which is created by Hive Aug 21, 2019
@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Aug 21, 2019

Test build #109510 has finished for PR 25542 at commit 7e89f55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 21, 2019

Test build #109509 has finished for PR 25542 at commit 7e89f55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member

wangyum commented Aug 22, 2019

It seems --jars sql/hive/src/test/resources/SPARK-21101-1.0.jar works without this patch:

build/sbt clean package -Phive -Phadoop-2.7 -Phive-thriftserver
export SPARK_PREPEND_CLASSES=true
bin/spark-sql --jars sql/hive/src/test/resources/SPARK-21101-1.0.jar

@sandeep-katta
Copy link
Contributor Author

please use hadoop-3.2 as it comes with Hive 2.3.5

@wangyum
Copy link
Member

wangyum commented Aug 22, 2019

Yes. I can reproduce this issue. Please add [test-hadoop3.2] to the PR title.

@sandeep-katta sandeep-katta changed the title [SPARK-28840][SQL] conf.getClassLoader in SparkSQLCLIDriver should be avoided as it returns the UDFClassLoader which is created by Hive [SPARK-28840][SQL][test-hadoop3.2]conf.getClassLoader in SparkSQLCLIDriver should be avoided as it returns the UDFClassLoader which is created by Hive Aug 22, 2019
@wangyum
Copy link
Member

wangyum commented Aug 22, 2019

retest this please

@dongjoon-hyun
Copy link
Member

Hi, guys. I already triggered both profiles.
Test build #109509 has finished for PR 25542 at commit 7e89f55. was hadoop-3.2.

@SparkQA
Copy link

SparkQA commented Aug 22, 2019

Test build #109547 has finished for PR 25542 at commit 7e89f55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member

wangyum commented Aug 22, 2019

This issue caused by HIVE-11878.

// components.
// See also: code in ExecDriver.java
var loader = conf.getClassLoader
var loader = orginalClassLoader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a correct fix. Another approach is add the Spark jars to Utilities.addToClassPath to make UDFClassLoader work:

val sparkJars = sparkConf.get(org.apache.spark.internal.config.JARS)
if (sparkJars.nonEmpty || StringUtils.isNotBlank(auxJars)) {
  loader = Utilities.addToClassPath(loader, sparkJars.toArray ++ StringUtils.split(auxJars, ","))
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var loader = conf.getClassLoader
var loader = orginalClassLoader
val auxJars = HiveConf.getVar(conf, HiveConf.ConfVars.HIVEAUXJARS)
if (StringUtils.isNotBlank(auxJars)) {
Copy link
Member

@wangyum wangyum Aug 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add another test case to cover Utilities.addToClassPath(loader, StringUtils.split(auxJars, ",")):

spark-sql --jars one.jar --conf spark.hadoop.hive.aux.jars.path=two.jar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see already test case is present for this test("Support hive.aux.jars.path") . Do you want me to add one more combination with --jars and --conf both ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Thank you.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see already test case is present for this test("Support hive.aux.jars.path") . Do you want me to add one more combination with --jars and --conf both ?

is the test case written for -- conf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

cliConf.set(k, v)
}

val orginalClassLoader = Thread.currentThread().getContextClassLoader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: orginal -> original

@sandeep-katta sandeep-katta force-pushed the jarIssue branch 2 times, most recently from 330bda1 to 82795ad Compare August 22, 2019 15:11
@SparkQA
Copy link

SparkQA commented Aug 22, 2019

Test build #109579 has finished for PR 25542 at commit bc87633.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 22, 2019

Test build #109581 has finished for PR 25542 at commit 82795ad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Aug 24, 2019

Test build #109669 has finished for PR 25542 at commit 82795ad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28840][SQL][test-hadoop3.2]conf.getClassLoader in SparkSQLCLIDriver should be avoided as it returns the UDFClassLoader which is created by Hive [SPARK-28840][SQL][test-hadoop3.2][test-maven] conf.getClassLoader in SparkSQLCLIDriver should be avoided as it returns the UDFClassLoader which is created by Hive Aug 24, 2019
@SparkQA
Copy link

SparkQA commented Sep 10, 2019

Test build #110422 has finished for PR 25542 at commit 533eeeb.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2019

Test build #110427 has finished for PR 25542 at commit 8eb6310.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2019

Test build #110430 has finished for PR 25542 at commit cf13bfa.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2019

Test build #110431 has finished for PR 25542 at commit 47f8632.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 11, 2019

Test build #110455 has finished for PR 25542 at commit 4a79580.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum changed the title [SPARK-28840][SQL][test-hadoop3.2] conf.getClassLoader in SparkSQLCLIDriver should be avoided as it returns the UDFClassLoader which is created by Hive [SPARK-28840][SQL] conf.getClassLoader in SparkSQLCLIDriver should be avoided as it returns the UDFClassLoader which is created by Hive Sep 11, 2019
@wangyum
Copy link
Member

wangyum commented Sep 11, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Sep 11, 2019

Test build #110462 has finished for PR 25542 at commit 4a79580.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum closed this in 7e61425 Sep 12, 2019
@wangyum
Copy link
Member

wangyum commented Sep 12, 2019

Merged to master.

PavithraRamachandran pushed a commit to PavithraRamachandran/spark that referenced this pull request Sep 15, 2019
… avoided as it returns the UDFClassLoader which is created by Hive

### What changes were proposed in this pull request?

Spark loads the jars to custom class loader which is returned by `getSubmitClassLoader` .
 [Spark code](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L337)

**In 1.2.1.spark2 version of Hive**

`HiveConf.getClassLoader` returns same the class loader which is set by the spark

**In Hive 2.3.5**
`HiveConf.getClassLoader` returns the UDFClassLoader which is created by Hive. Because of this spark cannot find the jars as class loader got changed
[Hive code](https://github.com/apache/hive/blob/rel/release-2.3.5/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L395)

### Why are the changes needed?
Before creating `CliSessionState` object save the current class loader object in some reference.
After SessionState.start() reset back class Loader to the one which saved earlier.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Added Test case and also Manually tested

**Before Fix**
![b4Fix](https://user-images.githubusercontent.com/35216143/63442838-6789f400-c451-11e9-9529-ccf4ea9621b9.png)

**After Fix**
![afterFix](https://user-images.githubusercontent.com/35216143/63442860-707ac580-c451-11e9-8012-2b70934d55f3.png)

Closes apache#25542 from sandeep-katta/jarIssue.

Lead-authored-by: sandeep katta <sandeep.katta2007@gmail.com>
Co-authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
srowen pushed a commit that referenced this pull request Sep 27, 2019
…ing SessionState for built-in Hive 2.3

### What changes were proposed in this pull request?

Hive 2.3 will set a new UDFClassLoader to hiveConf.classLoader when initializing SessionState since HIVE-11878,  and
1. ADDJarCommand will add jars to clientLoader.classLoader.
2. --jar passed jar will be added to clientLoader.classLoader
3.  jar passed by hive conf  `hive.aux.jars`  [SPARK-28954](#25653) [SPARK-28840](#25542) will be added to clientLoader.classLoader too

For these  reason we cannot load the jars added by ADDJarCommand because of class loader got changed. We reset it to clientLoader.ClassLoader here.

### Why are the changes needed?
support for jdk11

### Does this PR introduce any user-facing change?
NO

### How was this patch tested?
UT
```
export JAVA_HOME=/usr/lib/jdk-11.0.3
export PATH=$JAVA_HOME/bin:$PATH

build/sbt -Phive-thriftserver -Phadoop-3.2

hive/test-only *HiveSparkSubmitSuite -- -z "SPARK-8368: includes jars passed in through --jars"
hive-thriftserver/test-only *HiveThriftBinaryServerSuite -- -z "test add jar"
```

Closes #25775 from AngersZhuuuu/SPARK-29015-STS-JDK11.

Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants