Skip to content

Commit 0216051

Browse files
shardulm94maropu
authored andcommitted
[SPARK-34506][CORE] ADD JAR with ivy coordinates should be compatible with Hive transitive behavior
### What changes were proposed in this pull request? SPARK-33084 added the ability to use ivy coordinates with `SparkContext.addJar`. PR #29966 claims to mimic Hive behavior although I found a few cases where it doesn't 1) The default value of the transitive parameter is false, both in case of parameter not being specified in coordinate or parameter value being invalid. The Hive behavior is that transitive is [true if not specified](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe75222b/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L169) in the coordinate and [false for invalid values](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe75222b/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L124). Also, regardless of Hive, I think a default of true for the transitive parameter also matches [ivy's own defaults](https://ant.apache.org/ivy/history/2.5.0/ivyfile/dependency.html#_attributes). 2) The parameter value for transitive parameter is regarded as case-sensitive [based on the understanding](#29966 (comment)) that Hive behavior is case-sensitive. However, this is not correct, Hive [treats the parameter value case-insensitively](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe75222b/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L122). I propose that we be compatible with Hive for these behaviors ### Why are the changes needed? To make `ADD JAR` with ivy coordinates compatible with Hive's transitive behavior ### Does this PR introduce _any_ user-facing change? The user-facing changes here are within master as the feature introduced in SPARK-33084 has not been released yet 1. Previously an ivy coordinate without `transitive` parameter specified did not resolve transitive dependency, now it does. 2. Previously an `transitive` parameter value was treated case-sensitively. e.g. `transitive=TRUE` would be treated as false as it did not match exactly `true`. Now it will be treated case-insensitively. ### How was this patch tested? Modified existing unit tests to test new behavior Add new unit test to cover usage of `exclude` with unspecified `transitive` Closes #31623 from shardulm94/spark-34506. Authored-by: Shardul Mahadik <smahadik@linkedin.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
1 parent d07fc30 commit 0216051

File tree

5 files changed

+42
-22
lines changed

5 files changed

+42
-22
lines changed

core/src/main/scala/org/apache/spark/util/DependencyUtils.scala

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,9 @@ private[spark] object DependencyUtils extends Logging {
5959
* @param uri Ivy URI need to be downloaded.
6060
* @return Tuple value of parameter `transitive` and `exclude` value.
6161
*
62-
* 1. transitive: whether to download dependency jar of Ivy URI, default value is false
63-
* and this parameter value is case-sensitive. Invalid value will be treat as false.
62+
* 1. transitive: whether to download dependency jar of Ivy URI, default value is true
63+
* and this parameter value is case-insensitive. This mimics Hive's behaviour for
64+
* parsing the transitive parameter. Invalid value will be treat as false.
6465
* Example: Input: exclude=org.mortbay.jetty:jetty&transitive=true
6566
* Output: true
6667
*
@@ -72,7 +73,7 @@ private[spark] object DependencyUtils extends Logging {
7273
private def parseQueryParams(uri: URI): (Boolean, String) = {
7374
val uriQuery = uri.getQuery
7475
if (uriQuery == null) {
75-
(false, "")
76+
(true, "")
7677
} else {
7778
val mapTokens = uriQuery.split("&").map(_.split("="))
7879
if (mapTokens.exists(isInvalidQueryString)) {
@@ -81,14 +82,15 @@ private[spark] object DependencyUtils extends Logging {
8182
}
8283
val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1)
8384

84-
// Parse transitive parameters (e.g., transitive=true) in an Ivy URI, default value is false
85+
// Parse transitive parameters (e.g., transitive=true) in an Ivy URI, default value is true
8586
val transitiveParams = groupedParams.get("transitive")
8687
if (transitiveParams.map(_.size).getOrElse(0) > 1) {
8788
logWarning("It's best to specify `transitive` parameter in ivy URI query only once." +
8889
" If there are multiple `transitive` parameter, we will select the last one")
8990
}
9091
val transitive =
91-
transitiveParams.flatMap(_.takeRight(1).map(_._2 == "true").headOption).getOrElse(false)
92+
transitiveParams.flatMap(_.takeRight(1).map(_._2.equalsIgnoreCase("true")).headOption)
93+
.getOrElse(true)
9294

9395
// Parse an excluded list (e.g., exclude=org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http)
9496
// in an Ivy URI. When download Ivy URI jar, Spark won't download transitive jar
@@ -125,7 +127,7 @@ private[spark] object DependencyUtils extends Logging {
125127
* `parameter=value&parameter=value...`
126128
* Note that currently Ivy URI query part support two parameters:
127129
* 1. transitive: whether to download dependent jars related to your Ivy URI.
128-
* transitive=false or `transitive=true`, if not set, the default value is false.
130+
* transitive=false or `transitive=true`, if not set, the default value is true.
129131
* 2. exclude: exclusion list when download Ivy URI jar and dependency jars.
130132
* The `exclude` parameter content is a ',' separated `group:module` pair string :
131133
* `exclude=group:module,group:module...`

core/src/test/scala/org/apache/spark/SparkContextSuite.scala

Lines changed: 23 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1035,13 +1035,10 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
10351035
}
10361036
}
10371037

1038-
test("SPARK-33084: Add jar support Ivy URI -- default transitive = false") {
1038+
test("SPARK-33084: Add jar support Ivy URI -- default transitive = true") {
10391039
sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]"))
10401040
sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0")
10411041
assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar")))
1042-
assert(!sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar")))
1043-
1044-
sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?transitive=true")
10451042
assert(sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar")))
10461043
}
10471044

@@ -1083,6 +1080,22 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
10831080
}
10841081
}
10851082

1083+
test("SPARK-34506: Add jar support Ivy URI -- transitive=false will not download " +
1084+
"dependency jars") {
1085+
sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]"))
1086+
sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?transitive=false")
1087+
assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar")))
1088+
assert(!sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar")))
1089+
}
1090+
1091+
test("SPARK-34506: Add jar support Ivy URI -- test exclude param when transitive unspecified") {
1092+
sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]"))
1093+
sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?exclude=commons-lang:commons-lang")
1094+
assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar")))
1095+
assert(sc.listJars().exists(_.contains("org.slf4j_slf4j-api-1.7.10.jar")))
1096+
assert(!sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar")))
1097+
}
1098+
10861099
test("SPARK-33084: Add jar support Ivy URI -- test exclude param when transitive=true") {
10871100
sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]"))
10881101
sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0" +
@@ -1131,24 +1144,24 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
11311144

11321145
test("SPARK-33084: Add jar support Ivy URI -- test param key case sensitive") {
11331146
sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]"))
1134-
sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?TRANSITIVE=true")
1147+
sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?transitive=false")
11351148
assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar")))
11361149
assert(!sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar")))
11371150

1138-
sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?transitive=true")
1151+
sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?TRANSITIVE=false")
11391152
assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar")))
11401153
assert(sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar")))
11411154
}
11421155

1143-
test("SPARK-33084: Add jar support Ivy URI -- test transitive value case sensitive") {
1156+
test("SPARK-33084: Add jar support Ivy URI -- test transitive value case insensitive") {
11441157
sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]"))
1145-
sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?transitive=TRUE")
1158+
sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?transitive=FALSE")
11461159
assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar")))
11471160
assert(!sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar")))
11481161

1149-
sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?transitive=true")
1162+
sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?transitive=false")
11501163
assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar")))
1151-
assert(sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar")))
1164+
assert(!sc.listJars().exists(_.contains("commons-lang_commons-lang-2.6.jar")))
11521165
}
11531166

11541167
test("SPARK-34346: hadoop configuration priority for spark/hive/hadoop configs") {

docs/sql-ref-syntax-aux-resource-mgmt-add-jar.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ ADD JAR file_name
3636
The name of the JAR file to be added. It could be either on a local file system or a distributed file system or an Ivy URI.
3737
Apache Ivy is a popular dependency manager focusing on flexibility and simplicity. Now we support two parameter in URI query string:
3838

39-
* transitive: whether to download dependent jars related to your ivy URL. It is case-sensitive and only take last one if multiple transitive parameters are specified.
39+
* transitive: whether to download dependent jars related to your ivy URL. The parameter name is case-sensitive, and the parameter value is case-insensitive. If multiple transitive parameters are specified, the last one wins.
4040
* exclude: exclusion list during downloading Ivy URI jar and dependent jars.
4141

4242
User can write Ivy URI such as:

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3726,13 +3726,13 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
37263726
test("SPARK-33084: Add jar support Ivy URI in SQL") {
37273727
val sc = spark.sparkContext
37283728
val hiveVersion = "2.3.8"
3729-
// default transitive=false, only download specified jar
3730-
sql(s"ADD JAR ivy://org.apache.hive.hcatalog:hive-hcatalog-core:$hiveVersion")
3729+
// transitive=false, only download specified jar
3730+
sql(s"ADD JAR ivy://org.apache.hive.hcatalog:hive-hcatalog-core:$hiveVersion?transitive=false")
37313731
assert(sc.listJars()
37323732
.exists(_.contains(s"org.apache.hive.hcatalog_hive-hcatalog-core-$hiveVersion.jar")))
37333733

3734-
// test download ivy URL jar return multiple jars
3735-
sql("ADD JAR ivy://org.scala-js:scalajs-test-interface_2.12:1.2.0?transitive=true")
3734+
// default transitive=true, test download ivy URL jar return multiple jars
3735+
sql("ADD JAR ivy://org.scala-js:scalajs-test-interface_2.12:1.2.0")
37363736
assert(sc.listJars().exists(_.contains("scalajs-library_2.12")))
37373737
assert(sc.listJars().exists(_.contains("scalajs-test-interface_2.12")))
37383738

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1224,7 +1224,12 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd
12241224
test("SPARK-33084: Add jar support Ivy URI in SQL") {
12251225
val testData = TestHive.getHiveFile("data/files/sample.json").toURI
12261226
withTable("t") {
1227-
sql(s"ADD JAR ivy://org.apache.hive.hcatalog:hive-hcatalog-core:$hiveVersion")
1227+
// hive-catalog-core has some transitive dependencies which dont exist on maven central
1228+
// and hence cannot be found in the test environment or are non-jar (.pom) which cause
1229+
// failures in tests. Use transitive=false as it should be good enough to test the Ivy
1230+
// support in Hive ADD JAR
1231+
sql(s"ADD JAR ivy://org.apache.hive.hcatalog:hive-hcatalog-core:$hiveVersion" +
1232+
"?transitive=false")
12281233
sql(
12291234
"""CREATE TABLE t(a string, b string)
12301235
|ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'""".stripMargin)

0 commit comments

Comments
 (0)