Skip to content

[SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 #45075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,14 @@ private[spark] object MavenUtils extends Logging {
val ivySettings: IvySettings = new IvySettings
try {
ivySettings.load(file)
if (ivySettings.getDefaultIvyUserDir == null && ivySettings.getDefaultCache == null) {
// To protect old Ivy-based systems like old Spark from Apache Ivy 2.5.2's incompatibility.
// `processIvyPathArg` can overwrite these later.
val alternateIvyDir = System.getProperty("ivy.home",
System.getProperty("user.home") + File.separator + ".ivy2.5.2")
ivySettings.setDefaultIvyUserDir(new File(alternateIvyDir))
ivySettings.setDefaultCache(new File(alternateIvyDir, "cache"))
}
} catch {
case e @ (_: IOException | _: ParseException) =>
throw new SparkException(s"Failed when loading Ivy settings from $settingsFile", e)
Expand All @@ -335,10 +343,13 @@ private[spark] object MavenUtils extends Logging {

/* Set ivy settings for location of cache, if option is supplied */
private def processIvyPathArg(ivySettings: IvySettings, ivyPath: Option[String]): Unit = {
ivyPath.filterNot(_.trim.isEmpty).foreach { alternateIvyDir =>
ivySettings.setDefaultIvyUserDir(new File(alternateIvyDir))
ivySettings.setDefaultCache(new File(alternateIvyDir, "cache"))
val alternateIvyDir = ivyPath.filterNot(_.trim.isEmpty).getOrElse {
// To protect old Ivy-based systems like old Spark from Apache Ivy 2.5.2's incompatibility.
System.getProperty("ivy.home",
System.getProperty("user.home") + File.separator + ".ivy2.5.2")
}
ivySettings.setDefaultIvyUserDir(new File(alternateIvyDir))
ivySettings.setDefaultCache(new File(alternateIvyDir, "cache"))
}

/* Add any optional additional remote repositories */
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -374,7 +374,8 @@ private[spark] object IvyTestUtils {
f(repo.toURI.toString)
} finally {
// Clean up
if (repo.toString.contains(".m2") || repo.toString.contains(".ivy2")) {
if (repo.toString.contains(".m2") || repo.toString.contains(".ivy2") ||
repo.toString.contains(".ivy2.5.2")) {
val groupDir = getBaseGroupDirectory(artifact, useIvyLayout)
FileUtils.deleteDirectory(new File(repo, groupDir + File.separator + artifact.artifactId))
deps.foreach { _.foreach { dep =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2491,10 +2491,10 @@ package object config {
.doc("Path to specify the Ivy user directory, used for the local Ivy cache and " +
"package files from spark.jars.packages. " +
"This will override the Ivy property ivy.default.ivy.user.dir " +
"which defaults to ~/.ivy2.")
"which defaults to ~/.ivy2.5.2")
.version("1.3.0")
.stringConf
.createOptional
.createWithDefault("~/.ivy2.5.2")

private[spark] val JAR_IVY_SETTING_PATH =
ConfigBuilder(MavenUtils.JAR_IVY_SETTING_PATH_KEY)
Expand Down
2 changes: 1 addition & 1 deletion dev/deps/spark-deps-hadoop-3-hive-2.3
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ httpcore/4.4.16//httpcore-4.4.16.jar
icu4j/72.1//icu4j-72.1.jar
ini4j/0.5.4//ini4j-0.5.4.jar
istack-commons-runtime/3.0.8//istack-commons-runtime-3.0.8.jar
ivy/2.5.1//ivy-2.5.1.jar
ivy/2.5.2//ivy-2.5.2.jar
jackson-annotations/2.16.1//jackson-annotations-2.16.1.jar
jackson-core-asl/1.9.13//jackson-core-asl-1.9.13.jar
jackson-core/2.16.1//jackson-core-2.16.1.jar
Expand Down
2 changes: 2 additions & 0 deletions dev/run-tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -478,6 +478,8 @@ def main():
rm_r(os.path.join(SPARK_HOME, "work"))
rm_r(os.path.join(USER_HOME, ".ivy2", "local", "org.apache.spark"))
rm_r(os.path.join(USER_HOME, ".ivy2", "cache", "org.apache.spark"))
rm_r(os.path.join(USER_HOME, ".ivy2.5.2", "local", "org.apache.spark"))
rm_r(os.path.join(USER_HOME, ".ivy2.5.2", "cache", "org.apache.spark"))

os.environ["CURRENT_BLOCK"] = str(ERROR_CODES["BLOCK_GENERAL"])

Expand Down
2 changes: 2 additions & 0 deletions docs/core-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ license: |

- Since Spark 4.0, Spark uses `ReadWriteOncePod` instead of `ReadWriteOnce` access mode in persistence volume claims. To restore the legacy behavior, you can set `spark.kubernetes.legacy.useReadWriteOnceAccessMode` to `true`.

- Since Spark 4.0, Spark uses `~/.ivy2.5.2` as Ivy user directory by default to isolate the existing systems from Apache Ivy's incompatibility. To restore the legacy behavior, you can set `spark.jars.ivy` to `~/.ivy2`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it need to be changed again if we upgrade to use ivy 2.5.3 or 2.6.x in the future? Or can the name of this directory be:

.ivy2.5.2_and_above
.ivy2.5.2_plus
.ivy2.5.2_upwards
.ivy2.5.2_or_higher

?

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought like that. Something like .ivy2.5.2+.

After receiving your comment, I'm rethinking about that.

The bottom line is that the compatibility and release cycle depends on the Apache Ivy community, not Apache Spark community.

  • .ivy2.5.2 literally means Apache Ivy format written by Apache Ivy 2.5.2 .

    • If Apache Ivy 2.5.3 is not going to introduce any new change, it's still Apache Ivy 2.5.2-format.
    • If Apache Ivy 2.5.3 breaks the format again, we need to use .ivy2.5.3 at that time.
  • In addition, if we use .ivy2.5.2_or_higher, it could be an over-claim because Apache Spark community is unable to guarantee any compatibility for Apache Ivy 2.5.3 or higher which implies the naming.

    • Let's say we used .ivy2.5.2_or_higher and Apache Ivy 2.5.3 breaks the compatibility again. Then, we should change it again to .ivy2.5.3_or_higher. So, it introduces the same cost.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like .ivy2, we don't need to change this until the next Apache Ivy breaking change, @LuciferYang .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM


## Upgrading from Core 3.4 to 3.5

- Since Spark 3.5, `spark.yarn.executor.failuresValidityInterval` is deprecated. Use `spark.executor.failuresValidityInterval` instead.
Expand Down
6 changes: 1 addition & 5 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -146,11 +146,7 @@
<jetty.version>10.0.19</jetty.version>
<jakartaservlet.version>4.0.3</jakartaservlet.version>
<chill.version>0.10.0</chill.version>
<!--
SPARK-44968: don't upgrade Ivy to version 2.5.2 until the test aborted of
`HiveExternalCatalogVersionsSuite` in Java 11/17 daily tests is resolved.
-->
<ivy.version>2.5.1</ivy.version>
<ivy.version>2.5.2</ivy.version>
<oro.version>2.0.8</oro.version>
<!--
If you change codahale.metrics.version, you also need to change
Expand Down