-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-44968][BUILD] Downgrade ivy from 2.5.2 to 2.5.1 #42668
Conversation
The daily tests for Java 17 have failed for two consecutive days, both including the
|
If the failure can be reproduced, I will test downgrade the Ivy version to 2.5.1. cc @bjornjorgensen and @dongjoon-hyun to known. |
Got it. Thank you for investigating, @LuciferYang . |
https://github.com/LuciferYang/spark/actions/runs/5971242886/job/16200011102 still failed, let me downgrade ivy to test |
BTW, is this reproducible locally? |
This reverts commit 611e17e.
I have not yet found a way to reproduce the problem locally. |
Thank you. @LuciferYang So there is a new bug in ivy that only affects Java 17, where we get multiple artifacts of one module - log4j ? |
I'm not sure, but it appears that it's not only Java 17, the daily tests for Java 11 also have the same failures.
|
And if downgrade to 2.5.1, the test in GA passed https://github.com/LuciferYang/spark/actions/runs/5972232677/job/16209969721 |
Ok, well I can't figure out what's wrong here. Should we revert it and report it as a bug to Apache Ivy? |
@bjornjorgensen I think we can downgrade Ivy to 2.5.1 to restore the daily tests on Java 17 first. Reporting a bug to Ivy requires further investigation to ascertain whether this is indeed a bug with Ivy, as this issue can only be reproduced in the GitHub Action environment now. I have yet to find a way to reproduce it in local testing. WDYT @dongjoon-hyun ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
<ivy.version>2.5.2</ivy.version> | ||
<!-- | ||
SPARK-44968: don't upgrade Ivy to version 2.5.2 until the test aborted of | ||
`HiveExternalCatalogVersionsSuite` in Java 11/17 daily tests is resolved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, Java 21 works right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, according to the test results of GA, using Java 8 and 21-ea works normally, but using Java 11 and 17 will fail.
### What changes were proposed in this pull request? After upgrading Ivy from 2.5.1 to 2.5.2 in SPARK-44914, daily tests for Java 11 and Java 17 began to experience ABORTED in the `HiveExternalCatalogVersionsSuite` test. Java 11 - https://github.com/apache/spark/actions/runs/5953716283/job/16148657660 - https://github.com/apache/spark/actions/runs/5966131923/job/16185159550 Java 17 - https://github.com/apache/spark/actions/runs/5956925790/job/16158714165 - https://github.com/apache/spark/actions/runs/5969348559/job/16195073478 ``` 2023-08-23T23:00:49.6547573Z [info] 2023-08-23 16:00:48.209 - stdout> : java.lang.RuntimeException: problem during retrieve of org.apache.spark#spark-submit-parent-4c061f04-b951-4d06-8909-cde5452988d9: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6548745Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:238) 2023-08-23T23:00:49.6549572Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:89) 2023-08-23T23:00:49.6550334Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.Ivy.retrieve(Ivy.java:551) 2023-08-23T23:00:49.6551079Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1464) 2023-08-23T23:00:49.6552024Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.$anonfun$downloadVersion$2(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6552884Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42) 2023-08-23T23:00:49.6553755Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.downloadVersion(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6554705Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.liftedTree1$1(IsolatedClientLoader.scala:65) 2023-08-23T23:00:49.6555637Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.forVersion(IsolatedClientLoader.scala:64) 2023-08-23T23:00:49.6556554Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:443) 2023-08-23T23:00:49.6557340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:356) 2023-08-23T23:00:49.6558187Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71) 2023-08-23T23:00:49.6559061Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70) 2023-08-23T23:00:49.6559962Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6560766Z [info] 2023-08-23 16:00:48.209 - stdout> at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) 2023-08-23T23:00:49.6561584Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102) 2023-08-23T23:00:49.6562510Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6563435Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) 2023-08-23T23:00:49.6564323Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) 2023-08-23T23:00:49.6565340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) 2023-08-23T23:00:49.6566321Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) 2023-08-23T23:00:49.6567363Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) 2023-08-23T23:00:49.6568372Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) 2023-08-23T23:00:49.6569393Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:490) 2023-08-23T23:00:49.6570685Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:155) 2023-08-23T23:00:49.6571842Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113) 2023-08-23T23:00:49.6572932Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111) 2023-08-23T23:00:49.6573996Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125) 2023-08-23T23:00:49.6575045Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97) 2023-08-23T23:00:49.6576066Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) 2023-08-23T23:00:49.6576937Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) 2023-08-23T23:00:49.6577807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) 2023-08-23T23:00:49.6578620Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6579432Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) 2023-08-23T23:00:49.6580357Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97) 2023-08-23T23:00:49.6581331Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93) 2023-08-23T23:00:49.6582239Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) 2023-08-23T23:00:49.6583101Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) 2023-08-23T23:00:49.6584088Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) 2023-08-23T23:00:49.6585236Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6586519Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) 2023-08-23T23:00:49.6587686Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) 2023-08-23T23:00:49.6588898Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590014Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590993Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457) 2023-08-23T23:00:49.6591930Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:93) 2023-08-23T23:00:49.6592914Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:80) 2023-08-23T23:00:49.6593856Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:78) 2023-08-23T23:00:49.6594687Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219) 2023-08-23T23:00:49.6595379Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) 2023-08-23T23:00:49.6596103Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6596807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96) 2023-08-23T23:00:49.6597520Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618) 2023-08-23T23:00:49.6598276Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6599022Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) 2023-08-23T23:00:49.6599819Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2023-08-23T23:00:49.6600723Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) 2023-08-23T23:00:49.6601707Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2023-08-23T23:00:49.6602513Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.reflect.Method.invoke(Method.java:568) 2023-08-23T23:00:49.6603272Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) 2023-08-23T23:00:49.6604007Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 2023-08-23T23:00:49.6604724Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.Gateway.invoke(Gateway.java:282) 2023-08-23T23:00:49.6605416Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) 2023-08-23T23:00:49.6606209Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.CallCommand.execute(CallCommand.java:79) 2023-08-23T23:00:49.6606969Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) 2023-08-23T23:00:49.6607743Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.run(ClientServerConnection.java:106) 2023-08-23T23:00:49.6608415Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.Thread.run(Thread.java:833) 2023-08-23T23:00:49.6609288Z [info] 2023-08-23 16:00:48.209 - stdout> Caused by: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6610288Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.determineArtifactsToCopy(RetrieveEngine.java:426) 2023-08-23T23:00:49.6611332Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:122) 2023-08-23T23:00:49.6612046Z [info] 2023-08-23 16:00:48.209 - stdout> ... 66 more 2023-08-23T23:00:49.6612498Z [info] 2023-08-23 16:00:48.209 - stdout> ``` So this pr downgrade ivy from 2.5.2 to 2.5.1 to restore Java 11/17 daily tests. ### Why are the changes needed? To restore Java 11/17 daily tests. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By changing the default Java version in `build_and_test.yml` to 17 for verification, the tests succeed after downgrading the Ivy to 2.5.1. - https://github.com/LuciferYang/spark/actions/runs/5972232677/job/16209970934 <img width="1116" alt="image" src="https://github.com/apache/spark/assets/1475305/cd4002d8-893d-4845-8b2e-c01ff3106f7f"> ### Was this patch authored or co-authored using generative AI tooling? No Closes #42668 from LuciferYang/test-java17. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 4f8a199) Signed-off-by: yangjie01 <yangjie01@baidu.com>
Merged into master and 3.5. thanks @dongjoon-hyun @bjornjorgensen |
Thank you! |
### What changes were proposed in this pull request? This PR aims to upgrade Apache Ivy to 2.5.2 and protect old Ivy-based systems like old Spark from Apache Ivy 2.5.2's incompatibility by introducing a new `.ivy2.5.2` directory. - Apache Spark 4.0.0 will create this once and reuse this directory while all the other systems like old Sparks uses the old one, `.ivy2`. So, the behavior is the same with the case where Apache Spark 4.0.0 is installed and used in a new machine. - For the environments with `User-provided Ivy-path`es, the user might hit the incompatibility still. However, the users can mitigate them because they already have full control on `Ivy-path`es. ### Why are the changes needed? This was tried once and reverted logically due to Java 11 and Java 17 failures in Daily CIs. - #42613 - #42668 Currently, PR Builder also fails as of now. If the PR passes CIes, we can achieve the following. - [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) - FIX: CVE-2022-46751: Apache Ivy Is Vulnerable to XML External Entity Injections ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs including `HiveExternalCatalogVersionsSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45075 from dongjoon-hyun/SPARK-44914. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request? This PR aims to upgrade Apache Ivy to 2.5.2 and protect old Ivy-based systems like old Spark from Apache Ivy 2.5.2's incompatibility by introducing a new `.ivy2.5.2` directory. - Apache Spark 4.0.0 will create this once and reuse this directory while all the other systems like old Sparks uses the old one, `.ivy2`. So, the behavior is the same with the case where Apache Spark 4.0.0 is installed and used in a new machine. - For the environments with `User-provided Ivy-path`es, the user might hit the incompatibility still. However, the users can mitigate them because they already have full control on `Ivy-path`es. ### Why are the changes needed? This was tried once and reverted logically due to Java 11 and Java 17 failures in Daily CIs. - apache#42613 - apache#42668 Currently, PR Builder also fails as of now. If the PR passes CIes, we can achieve the following. - [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) - FIX: CVE-2022-46751: Apache Ivy Is Vulnerable to XML External Entity Injections ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs including `HiveExternalCatalogVersionsSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#45075 from dongjoon-hyun/SPARK-44914. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
This PR aims to upgrade Apache Ivy to 2.5.2 and protect old Ivy-based systems like old Spark from Apache Ivy 2.5.2's incompatibility by introducing a new `.ivy2.5.2` directory. - Apache Spark 4.0.0 will create this once and reuse this directory while all the other systems like old Sparks uses the old one, `.ivy2`. So, the behavior is the same with the case where Apache Spark 4.0.0 is installed and used in a new machine. - For the environments with `User-provided Ivy-path`es, the user might hit the incompatibility still. However, the users can mitigate them because they already have full control on `Ivy-path`es. This was tried once and reverted logically due to Java 11 and Java 17 failures in Daily CIs. - apache#42613 - apache#42668 Currently, PR Builder also fails as of now. If the PR passes CIes, we can achieve the following. - [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) - FIX: CVE-2022-46751: Apache Ivy Is Vulnerable to XML External Entity Injections No. Pass the CIs including `HiveExternalCatalogVersionsSuite`. No. Closes apache#45075 from dongjoon-hyun/SPARK-44914. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 3baa60a) [SPARK-44968][BUILD] Downgrade ivy from 2.5.2 to 2.5.1 ### What changes were proposed in this pull request? After upgrading Ivy from 2.5.1 to 2.5.2 in SPARK-44914, daily tests for Java 11 and Java 17 began to experience ABORTED in the `HiveExternalCatalogVersionsSuite` test. Java 11 - https://github.com/apache/spark/actions/runs/5953716283/job/16148657660 - https://github.com/apache/spark/actions/runs/5966131923/job/16185159550 Java 17 - https://github.com/apache/spark/actions/runs/5956925790/job/16158714165 - https://github.com/apache/spark/actions/runs/5969348559/job/16195073478 ``` 2023-08-23T23:00:49.6547573Z [info] 2023-08-23 16:00:48.209 - stdout> : java.lang.RuntimeException: problem during retrieve of org.apache.spark#spark-submit-parent-4c061f04-b951-4d06-8909-cde5452988d9: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6548745Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:238) 2023-08-23T23:00:49.6549572Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:89) 2023-08-23T23:00:49.6550334Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.Ivy.retrieve(Ivy.java:551) 2023-08-23T23:00:49.6551079Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1464) 2023-08-23T23:00:49.6552024Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.$anonfun$downloadVersion$2(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6552884Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42) 2023-08-23T23:00:49.6553755Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.downloadVersion(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6554705Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.liftedTree1$1(IsolatedClientLoader.scala:65) 2023-08-23T23:00:49.6555637Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.forVersion(IsolatedClientLoader.scala:64) 2023-08-23T23:00:49.6556554Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:443) 2023-08-23T23:00:49.6557340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:356) 2023-08-23T23:00:49.6558187Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71) 2023-08-23T23:00:49.6559061Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70) 2023-08-23T23:00:49.6559962Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6560766Z [info] 2023-08-23 16:00:48.209 - stdout> at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) 2023-08-23T23:00:49.6561584Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102) 2023-08-23T23:00:49.6562510Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6563435Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) 2023-08-23T23:00:49.6564323Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) 2023-08-23T23:00:49.6565340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) 2023-08-23T23:00:49.6566321Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) 2023-08-23T23:00:49.6567363Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) 2023-08-23T23:00:49.6568372Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) 2023-08-23T23:00:49.6569393Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:490) 2023-08-23T23:00:49.6570685Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:155) 2023-08-23T23:00:49.6571842Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113) 2023-08-23T23:00:49.6572932Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111) 2023-08-23T23:00:49.6573996Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125) 2023-08-23T23:00:49.6575045Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97) 2023-08-23T23:00:49.6576066Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) 2023-08-23T23:00:49.6576937Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) 2023-08-23T23:00:49.6577807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) 2023-08-23T23:00:49.6578620Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6579432Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) 2023-08-23T23:00:49.6580357Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97) 2023-08-23T23:00:49.6581331Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93) 2023-08-23T23:00:49.6582239Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) 2023-08-23T23:00:49.6583101Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) 2023-08-23T23:00:49.6584088Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) 2023-08-23T23:00:49.6585236Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6586519Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) 2023-08-23T23:00:49.6587686Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) 2023-08-23T23:00:49.6588898Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590014Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590993Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457) 2023-08-23T23:00:49.6591930Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:93) 2023-08-23T23:00:49.6592914Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:80) 2023-08-23T23:00:49.6593856Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:78) 2023-08-23T23:00:49.6594687Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219) 2023-08-23T23:00:49.6595379Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) 2023-08-23T23:00:49.6596103Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6596807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96) 2023-08-23T23:00:49.6597520Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618) 2023-08-23T23:00:49.6598276Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6599022Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) 2023-08-23T23:00:49.6599819Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2023-08-23T23:00:49.6600723Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) 2023-08-23T23:00:49.6601707Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2023-08-23T23:00:49.6602513Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.reflect.Method.invoke(Method.java:568) 2023-08-23T23:00:49.6603272Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) 2023-08-23T23:00:49.6604007Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 2023-08-23T23:00:49.6604724Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.Gateway.invoke(Gateway.java:282) 2023-08-23T23:00:49.6605416Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) 2023-08-23T23:00:49.6606209Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.CallCommand.execute(CallCommand.java:79) 2023-08-23T23:00:49.6606969Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) 2023-08-23T23:00:49.6607743Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.run(ClientServerConnection.java:106) 2023-08-23T23:00:49.6608415Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.Thread.run(Thread.java:833) 2023-08-23T23:00:49.6609288Z [info] 2023-08-23 16:00:48.209 - stdout> Caused by: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6610288Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.determineArtifactsToCopy(RetrieveEngine.java:426) 2023-08-23T23:00:49.6611332Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:122) 2023-08-23T23:00:49.6612046Z [info] 2023-08-23 16:00:48.209 - stdout> ... 66 more 2023-08-23T23:00:49.6612498Z [info] 2023-08-23 16:00:48.209 - stdout> ``` So this pr downgrade ivy from 2.5.2 to 2.5.1 to restore Java 11/17 daily tests. ### Why are the changes needed? To restore Java 11/17 daily tests. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By changing the default Java version in `build_and_test.yml` to 17 for verification, the tests succeed after downgrading the Ivy to 2.5.1. - https://github.com/LuciferYang/spark/actions/runs/5972232677/job/16209970934 <img width="1116" alt="image" src="https://github.com/apache/spark/assets/1475305/cd4002d8-893d-4845-8b2e-c01ff3106f7f"> ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#42668 from LuciferYang/test-java17. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 4f8a199) [SPARK-44914][BUILD] Upgrade `Apache ivy` from 2.5.1 to 2.5.2 Upgrade Apache ivy from 2.5.1 to 2.5.2 [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) [CVE-2022-46751](https://www.cve.org/CVERecord?id=CVE-2022-46751) The fix apache/ant-ivy@2be17bc No. Pass GA No. Closes apache#42613 from bjornjorgensen/ivy-2.5.2. Authored-by: Bjørn Jørgensen <bjornjorgensen@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 611e17e) [SPARK-41030][BUILD] Upgrade `Apache Ivy` to 2.5.1 Upgrade `Apache Ivy` from 2.5.0 to 2.5.1 [Release notes](https://ant.apache.org/ivy/history/2.5.1/release-notes.html) [CVE-2022-37865](https://www.cve.org/CVERecord?id=CVE-2022-37865) and [CVE-2022-37866](https://nvd.nist.gov/vuln/detail/CVE-2022-37866) No. Pass GA Closes apache#38539 from bjornjorgensen/ivy-2.5.1. Authored-by: Bjørn <bjornjorgensen@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 4bbdca6)
This PR aims to upgrade Apache Ivy to 2.5.2 and protect old Ivy-based systems like old Spark from Apache Ivy 2.5.2's incompatibility by introducing a new `.ivy2.5.2` directory. - Apache Spark 4.0.0 will create this once and reuse this directory while all the other systems like old Sparks uses the old one, `.ivy2`. So, the behavior is the same with the case where Apache Spark 4.0.0 is installed and used in a new machine. - For the environments with `User-provided Ivy-path`es, the user might hit the incompatibility still. However, the users can mitigate them because they already have full control on `Ivy-path`es. This was tried once and reverted logically due to Java 11 and Java 17 failures in Daily CIs. - apache#42613 - apache#42668 Currently, PR Builder also fails as of now. If the PR passes CIes, we can achieve the following. - [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) - FIX: CVE-2022-46751: Apache Ivy Is Vulnerable to XML External Entity Injections No. Pass the CIs including `HiveExternalCatalogVersionsSuite`. No. Closes apache#45075 from dongjoon-hyun/SPARK-44914. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 3baa60a) [SPARK-44968][BUILD] Downgrade ivy from 2.5.2 to 2.5.1 ### What changes were proposed in this pull request? After upgrading Ivy from 2.5.1 to 2.5.2 in SPARK-44914, daily tests for Java 11 and Java 17 began to experience ABORTED in the `HiveExternalCatalogVersionsSuite` test. Java 11 - https://github.com/apache/spark/actions/runs/5953716283/job/16148657660 - https://github.com/apache/spark/actions/runs/5966131923/job/16185159550 Java 17 - https://github.com/apache/spark/actions/runs/5956925790/job/16158714165 - https://github.com/apache/spark/actions/runs/5969348559/job/16195073478 ``` 2023-08-23T23:00:49.6547573Z [info] 2023-08-23 16:00:48.209 - stdout> : java.lang.RuntimeException: problem during retrieve of org.apache.spark#spark-submit-parent-4c061f04-b951-4d06-8909-cde5452988d9: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6548745Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:238) 2023-08-23T23:00:49.6549572Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:89) 2023-08-23T23:00:49.6550334Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.Ivy.retrieve(Ivy.java:551) 2023-08-23T23:00:49.6551079Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1464) 2023-08-23T23:00:49.6552024Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.$anonfun$downloadVersion$2(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6552884Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42) 2023-08-23T23:00:49.6553755Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.downloadVersion(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6554705Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.liftedTree1$1(IsolatedClientLoader.scala:65) 2023-08-23T23:00:49.6555637Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.forVersion(IsolatedClientLoader.scala:64) 2023-08-23T23:00:49.6556554Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:443) 2023-08-23T23:00:49.6557340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:356) 2023-08-23T23:00:49.6558187Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71) 2023-08-23T23:00:49.6559061Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70) 2023-08-23T23:00:49.6559962Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6560766Z [info] 2023-08-23 16:00:48.209 - stdout> at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) 2023-08-23T23:00:49.6561584Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102) 2023-08-23T23:00:49.6562510Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6563435Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) 2023-08-23T23:00:49.6564323Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) 2023-08-23T23:00:49.6565340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) 2023-08-23T23:00:49.6566321Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) 2023-08-23T23:00:49.6567363Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) 2023-08-23T23:00:49.6568372Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) 2023-08-23T23:00:49.6569393Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:490) 2023-08-23T23:00:49.6570685Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:155) 2023-08-23T23:00:49.6571842Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113) 2023-08-23T23:00:49.6572932Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111) 2023-08-23T23:00:49.6573996Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125) 2023-08-23T23:00:49.6575045Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97) 2023-08-23T23:00:49.6576066Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) 2023-08-23T23:00:49.6576937Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) 2023-08-23T23:00:49.6577807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) 2023-08-23T23:00:49.6578620Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6579432Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) 2023-08-23T23:00:49.6580357Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97) 2023-08-23T23:00:49.6581331Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93) 2023-08-23T23:00:49.6582239Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) 2023-08-23T23:00:49.6583101Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) 2023-08-23T23:00:49.6584088Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) 2023-08-23T23:00:49.6585236Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6586519Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) 2023-08-23T23:00:49.6587686Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) 2023-08-23T23:00:49.6588898Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590014Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590993Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457) 2023-08-23T23:00:49.6591930Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:93) 2023-08-23T23:00:49.6592914Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:80) 2023-08-23T23:00:49.6593856Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:78) 2023-08-23T23:00:49.6594687Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219) 2023-08-23T23:00:49.6595379Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) 2023-08-23T23:00:49.6596103Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6596807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96) 2023-08-23T23:00:49.6597520Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618) 2023-08-23T23:00:49.6598276Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6599022Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) 2023-08-23T23:00:49.6599819Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2023-08-23T23:00:49.6600723Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) 2023-08-23T23:00:49.6601707Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2023-08-23T23:00:49.6602513Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.reflect.Method.invoke(Method.java:568) 2023-08-23T23:00:49.6603272Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) 2023-08-23T23:00:49.6604007Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 2023-08-23T23:00:49.6604724Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.Gateway.invoke(Gateway.java:282) 2023-08-23T23:00:49.6605416Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) 2023-08-23T23:00:49.6606209Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.CallCommand.execute(CallCommand.java:79) 2023-08-23T23:00:49.6606969Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) 2023-08-23T23:00:49.6607743Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.run(ClientServerConnection.java:106) 2023-08-23T23:00:49.6608415Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.Thread.run(Thread.java:833) 2023-08-23T23:00:49.6609288Z [info] 2023-08-23 16:00:48.209 - stdout> Caused by: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6610288Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.determineArtifactsToCopy(RetrieveEngine.java:426) 2023-08-23T23:00:49.6611332Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:122) 2023-08-23T23:00:49.6612046Z [info] 2023-08-23 16:00:48.209 - stdout> ... 66 more 2023-08-23T23:00:49.6612498Z [info] 2023-08-23 16:00:48.209 - stdout> ``` So this pr downgrade ivy from 2.5.2 to 2.5.1 to restore Java 11/17 daily tests. ### Why are the changes needed? To restore Java 11/17 daily tests. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By changing the default Java version in `build_and_test.yml` to 17 for verification, the tests succeed after downgrading the Ivy to 2.5.1. - https://github.com/LuciferYang/spark/actions/runs/5972232677/job/16209970934 <img width="1116" alt="image" src="https://github.com/apache/spark/assets/1475305/cd4002d8-893d-4845-8b2e-c01ff3106f7f"> ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#42668 from LuciferYang/test-java17. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 4f8a199) [SPARK-44914][BUILD] Upgrade `Apache ivy` from 2.5.1 to 2.5.2 Upgrade Apache ivy from 2.5.1 to 2.5.2 [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) [CVE-2022-46751](https://www.cve.org/CVERecord?id=CVE-2022-46751) The fix apache/ant-ivy@2be17bc No. Pass GA No. Closes apache#42613 from bjornjorgensen/ivy-2.5.2. Authored-by: Bjørn Jørgensen <bjornjorgensen@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 611e17e) [SPARK-41030][BUILD] Upgrade `Apache Ivy` to 2.5.1 Upgrade `Apache Ivy` from 2.5.0 to 2.5.1 [Release notes](https://ant.apache.org/ivy/history/2.5.1/release-notes.html) [CVE-2022-37865](https://www.cve.org/CVERecord?id=CVE-2022-37865) and [CVE-2022-37866](https://nvd.nist.gov/vuln/detail/CVE-2022-37866) No. Pass GA Closes apache#38539 from bjornjorgensen/ivy-2.5.1. Authored-by: Bjørn <bjornjorgensen@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 4bbdca6)
This PR aims to upgrade Apache Ivy to 2.5.2 and protect old Ivy-based systems like old Spark from Apache Ivy 2.5.2's incompatibility by introducing a new `.ivy2.5.2` directory. - Apache Spark 4.0.0 will create this once and reuse this directory while all the other systems like old Sparks uses the old one, `.ivy2`. So, the behavior is the same with the case where Apache Spark 4.0.0 is installed and used in a new machine. - For the environments with `User-provided Ivy-path`es, the user might hit the incompatibility still. However, the users can mitigate them because they already have full control on `Ivy-path`es. This was tried once and reverted logically due to Java 11 and Java 17 failures in Daily CIs. - apache#42613 - apache#42668 Currently, PR Builder also fails as of now. If the PR passes CIes, we can achieve the following. - [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) - FIX: CVE-2022-46751: Apache Ivy Is Vulnerable to XML External Entity Injections No. Pass the CIs including `HiveExternalCatalogVersionsSuite`. No. Closes apache#45075 from dongjoon-hyun/SPARK-44914. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 3baa60a) [SPARK-44968][BUILD] Downgrade ivy from 2.5.2 to 2.5.1 ### What changes were proposed in this pull request? After upgrading Ivy from 2.5.1 to 2.5.2 in SPARK-44914, daily tests for Java 11 and Java 17 began to experience ABORTED in the `HiveExternalCatalogVersionsSuite` test. Java 11 - https://github.com/apache/spark/actions/runs/5953716283/job/16148657660 - https://github.com/apache/spark/actions/runs/5966131923/job/16185159550 Java 17 - https://github.com/apache/spark/actions/runs/5956925790/job/16158714165 - https://github.com/apache/spark/actions/runs/5969348559/job/16195073478 ``` 2023-08-23T23:00:49.6547573Z [info] 2023-08-23 16:00:48.209 - stdout> : java.lang.RuntimeException: problem during retrieve of org.apache.spark#spark-submit-parent-4c061f04-b951-4d06-8909-cde5452988d9: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6548745Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:238) 2023-08-23T23:00:49.6549572Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:89) 2023-08-23T23:00:49.6550334Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.Ivy.retrieve(Ivy.java:551) 2023-08-23T23:00:49.6551079Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1464) 2023-08-23T23:00:49.6552024Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.$anonfun$downloadVersion$2(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6552884Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42) 2023-08-23T23:00:49.6553755Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.downloadVersion(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6554705Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.liftedTree1$1(IsolatedClientLoader.scala:65) 2023-08-23T23:00:49.6555637Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.forVersion(IsolatedClientLoader.scala:64) 2023-08-23T23:00:49.6556554Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:443) 2023-08-23T23:00:49.6557340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:356) 2023-08-23T23:00:49.6558187Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71) 2023-08-23T23:00:49.6559061Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70) 2023-08-23T23:00:49.6559962Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6560766Z [info] 2023-08-23 16:00:48.209 - stdout> at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) 2023-08-23T23:00:49.6561584Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102) 2023-08-23T23:00:49.6562510Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6563435Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) 2023-08-23T23:00:49.6564323Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) 2023-08-23T23:00:49.6565340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) 2023-08-23T23:00:49.6566321Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) 2023-08-23T23:00:49.6567363Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) 2023-08-23T23:00:49.6568372Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) 2023-08-23T23:00:49.6569393Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:490) 2023-08-23T23:00:49.6570685Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:155) 2023-08-23T23:00:49.6571842Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113) 2023-08-23T23:00:49.6572932Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111) 2023-08-23T23:00:49.6573996Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125) 2023-08-23T23:00:49.6575045Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97) 2023-08-23T23:00:49.6576066Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) 2023-08-23T23:00:49.6576937Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) 2023-08-23T23:00:49.6577807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) 2023-08-23T23:00:49.6578620Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6579432Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) 2023-08-23T23:00:49.6580357Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97) 2023-08-23T23:00:49.6581331Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93) 2023-08-23T23:00:49.6582239Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) 2023-08-23T23:00:49.6583101Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) 2023-08-23T23:00:49.6584088Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) 2023-08-23T23:00:49.6585236Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6586519Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) 2023-08-23T23:00:49.6587686Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) 2023-08-23T23:00:49.6588898Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590014Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590993Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457) 2023-08-23T23:00:49.6591930Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:93) 2023-08-23T23:00:49.6592914Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:80) 2023-08-23T23:00:49.6593856Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:78) 2023-08-23T23:00:49.6594687Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219) 2023-08-23T23:00:49.6595379Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) 2023-08-23T23:00:49.6596103Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6596807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96) 2023-08-23T23:00:49.6597520Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618) 2023-08-23T23:00:49.6598276Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6599022Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) 2023-08-23T23:00:49.6599819Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2023-08-23T23:00:49.6600723Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) 2023-08-23T23:00:49.6601707Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2023-08-23T23:00:49.6602513Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.reflect.Method.invoke(Method.java:568) 2023-08-23T23:00:49.6603272Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) 2023-08-23T23:00:49.6604007Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 2023-08-23T23:00:49.6604724Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.Gateway.invoke(Gateway.java:282) 2023-08-23T23:00:49.6605416Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) 2023-08-23T23:00:49.6606209Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.CallCommand.execute(CallCommand.java:79) 2023-08-23T23:00:49.6606969Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) 2023-08-23T23:00:49.6607743Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.run(ClientServerConnection.java:106) 2023-08-23T23:00:49.6608415Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.Thread.run(Thread.java:833) 2023-08-23T23:00:49.6609288Z [info] 2023-08-23 16:00:48.209 - stdout> Caused by: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6610288Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.determineArtifactsToCopy(RetrieveEngine.java:426) 2023-08-23T23:00:49.6611332Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:122) 2023-08-23T23:00:49.6612046Z [info] 2023-08-23 16:00:48.209 - stdout> ... 66 more 2023-08-23T23:00:49.6612498Z [info] 2023-08-23 16:00:48.209 - stdout> ``` So this pr downgrade ivy from 2.5.2 to 2.5.1 to restore Java 11/17 daily tests. ### Why are the changes needed? To restore Java 11/17 daily tests. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By changing the default Java version in `build_and_test.yml` to 17 for verification, the tests succeed after downgrading the Ivy to 2.5.1. - https://github.com/LuciferYang/spark/actions/runs/5972232677/job/16209970934 <img width="1116" alt="image" src="https://github.com/apache/spark/assets/1475305/cd4002d8-893d-4845-8b2e-c01ff3106f7f"> ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#42668 from LuciferYang/test-java17. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 4f8a199) [SPARK-44914][BUILD] Upgrade `Apache ivy` from 2.5.1 to 2.5.2 Upgrade Apache ivy from 2.5.1 to 2.5.2 [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) [CVE-2022-46751](https://www.cve.org/CVERecord?id=CVE-2022-46751) The fix apache/ant-ivy@2be17bc No. Pass GA No. Closes apache#42613 from bjornjorgensen/ivy-2.5.2. Authored-by: Bjørn Jørgensen <bjornjorgensen@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 611e17e) [SPARK-41030][BUILD] Upgrade `Apache Ivy` to 2.5.1 Upgrade `Apache Ivy` from 2.5.0 to 2.5.1 [Release notes](https://ant.apache.org/ivy/history/2.5.1/release-notes.html) [CVE-2022-37865](https://www.cve.org/CVERecord?id=CVE-2022-37865) and [CVE-2022-37866](https://nvd.nist.gov/vuln/detail/CVE-2022-37866) No. Pass GA Closes apache#38539 from bjornjorgensen/ivy-2.5.1. Authored-by: Bjørn <bjornjorgensen@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 4bbdca6) (cherry picked from commit 0e5fa79) # Conflicts: # dev/deps/spark-deps-hadoop-2-hive-2.3 # dev/deps/spark-deps-hadoop-3-hive-2.3 # docs/core-migration-guide.md # pom.xml
This PR aims to upgrade Apache Ivy to 2.5.2 and protect old Ivy-based systems like old Spark from Apache Ivy 2.5.2's incompatibility by introducing a new `.ivy2.5.2` directory. - Apache Spark 4.0.0 will create this once and reuse this directory while all the other systems like old Sparks uses the old one, `.ivy2`. So, the behavior is the same with the case where Apache Spark 4.0.0 is installed and used in a new machine. - For the environments with `User-provided Ivy-path`es, the user might hit the incompatibility still. However, the users can mitigate them because they already have full control on `Ivy-path`es. This was tried once and reverted logically due to Java 11 and Java 17 failures in Daily CIs. - apache#42613 - apache#42668 Currently, PR Builder also fails as of now. If the PR passes CIes, we can achieve the following. - [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) - FIX: CVE-2022-46751: Apache Ivy Is Vulnerable to XML External Entity Injections No. Pass the CIs including `HiveExternalCatalogVersionsSuite`. No. Closes apache#45075 from dongjoon-hyun/SPARK-44914. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 3baa60a) [SPARK-44968][BUILD] Downgrade ivy from 2.5.2 to 2.5.1 ### What changes were proposed in this pull request? After upgrading Ivy from 2.5.1 to 2.5.2 in SPARK-44914, daily tests for Java 11 and Java 17 began to experience ABORTED in the `HiveExternalCatalogVersionsSuite` test. Java 11 - https://github.com/apache/spark/actions/runs/5953716283/job/16148657660 - https://github.com/apache/spark/actions/runs/5966131923/job/16185159550 Java 17 - https://github.com/apache/spark/actions/runs/5956925790/job/16158714165 - https://github.com/apache/spark/actions/runs/5969348559/job/16195073478 ``` 2023-08-23T23:00:49.6547573Z [info] 2023-08-23 16:00:48.209 - stdout> : java.lang.RuntimeException: problem during retrieve of org.apache.spark#spark-submit-parent-4c061f04-b951-4d06-8909-cde5452988d9: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6548745Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:238) 2023-08-23T23:00:49.6549572Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:89) 2023-08-23T23:00:49.6550334Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.Ivy.retrieve(Ivy.java:551) 2023-08-23T23:00:49.6551079Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1464) 2023-08-23T23:00:49.6552024Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.$anonfun$downloadVersion$2(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6552884Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42) 2023-08-23T23:00:49.6553755Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.downloadVersion(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6554705Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.liftedTree1$1(IsolatedClientLoader.scala:65) 2023-08-23T23:00:49.6555637Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.forVersion(IsolatedClientLoader.scala:64) 2023-08-23T23:00:49.6556554Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:443) 2023-08-23T23:00:49.6557340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:356) 2023-08-23T23:00:49.6558187Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71) 2023-08-23T23:00:49.6559061Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70) 2023-08-23T23:00:49.6559962Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6560766Z [info] 2023-08-23 16:00:48.209 - stdout> at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) 2023-08-23T23:00:49.6561584Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102) 2023-08-23T23:00:49.6562510Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6563435Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) 2023-08-23T23:00:49.6564323Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) 2023-08-23T23:00:49.6565340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) 2023-08-23T23:00:49.6566321Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) 2023-08-23T23:00:49.6567363Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) 2023-08-23T23:00:49.6568372Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) 2023-08-23T23:00:49.6569393Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:490) 2023-08-23T23:00:49.6570685Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:155) 2023-08-23T23:00:49.6571842Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113) 2023-08-23T23:00:49.6572932Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111) 2023-08-23T23:00:49.6573996Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125) 2023-08-23T23:00:49.6575045Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97) 2023-08-23T23:00:49.6576066Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) 2023-08-23T23:00:49.6576937Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) 2023-08-23T23:00:49.6577807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) 2023-08-23T23:00:49.6578620Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6579432Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) 2023-08-23T23:00:49.6580357Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97) 2023-08-23T23:00:49.6581331Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93) 2023-08-23T23:00:49.6582239Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) 2023-08-23T23:00:49.6583101Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) 2023-08-23T23:00:49.6584088Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) 2023-08-23T23:00:49.6585236Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6586519Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) 2023-08-23T23:00:49.6587686Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) 2023-08-23T23:00:49.6588898Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590014Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590993Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457) 2023-08-23T23:00:49.6591930Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:93) 2023-08-23T23:00:49.6592914Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:80) 2023-08-23T23:00:49.6593856Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:78) 2023-08-23T23:00:49.6594687Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219) 2023-08-23T23:00:49.6595379Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) 2023-08-23T23:00:49.6596103Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6596807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96) 2023-08-23T23:00:49.6597520Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618) 2023-08-23T23:00:49.6598276Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6599022Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) 2023-08-23T23:00:49.6599819Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2023-08-23T23:00:49.6600723Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) 2023-08-23T23:00:49.6601707Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2023-08-23T23:00:49.6602513Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.reflect.Method.invoke(Method.java:568) 2023-08-23T23:00:49.6603272Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) 2023-08-23T23:00:49.6604007Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 2023-08-23T23:00:49.6604724Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.Gateway.invoke(Gateway.java:282) 2023-08-23T23:00:49.6605416Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) 2023-08-23T23:00:49.6606209Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.CallCommand.execute(CallCommand.java:79) 2023-08-23T23:00:49.6606969Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) 2023-08-23T23:00:49.6607743Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.run(ClientServerConnection.java:106) 2023-08-23T23:00:49.6608415Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.Thread.run(Thread.java:833) 2023-08-23T23:00:49.6609288Z [info] 2023-08-23 16:00:48.209 - stdout> Caused by: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6610288Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.determineArtifactsToCopy(RetrieveEngine.java:426) 2023-08-23T23:00:49.6611332Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:122) 2023-08-23T23:00:49.6612046Z [info] 2023-08-23 16:00:48.209 - stdout> ... 66 more 2023-08-23T23:00:49.6612498Z [info] 2023-08-23 16:00:48.209 - stdout> ``` So this pr downgrade ivy from 2.5.2 to 2.5.1 to restore Java 11/17 daily tests. ### Why are the changes needed? To restore Java 11/17 daily tests. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By changing the default Java version in `build_and_test.yml` to 17 for verification, the tests succeed after downgrading the Ivy to 2.5.1. - https://github.com/LuciferYang/spark/actions/runs/5972232677/job/16209970934 <img width="1116" alt="image" src="https://github.com/apache/spark/assets/1475305/cd4002d8-893d-4845-8b2e-c01ff3106f7f"> ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#42668 from LuciferYang/test-java17. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 4f8a199) [SPARK-44914][BUILD] Upgrade `Apache ivy` from 2.5.1 to 2.5.2 Upgrade Apache ivy from 2.5.1 to 2.5.2 [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) [CVE-2022-46751](https://www.cve.org/CVERecord?id=CVE-2022-46751) The fix apache/ant-ivy@2be17bc No. Pass GA No. Closes apache#42613 from bjornjorgensen/ivy-2.5.2. Authored-by: Bjørn Jørgensen <bjornjorgensen@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 611e17e) [SPARK-41030][BUILD] Upgrade `Apache Ivy` to 2.5.1 Upgrade `Apache Ivy` from 2.5.0 to 2.5.1 [Release notes](https://ant.apache.org/ivy/history/2.5.1/release-notes.html) [CVE-2022-37865](https://www.cve.org/CVERecord?id=CVE-2022-37865) and [CVE-2022-37866](https://nvd.nist.gov/vuln/detail/CVE-2022-37866) No. Pass GA Closes apache#38539 from bjornjorgensen/ivy-2.5.1. Authored-by: Bjørn <bjornjorgensen@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 4bbdca6) (cherry picked from commit 0e5fa79) # Conflicts: # dev/deps/spark-deps-hadoop-2-hive-2.3 # dev/deps/spark-deps-hadoop-3-hive-2.3 # docs/core-migration-guide.md # pom.xml
* ODP-1304 [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 This PR aims to upgrade Apache Ivy to 2.5.2 and protect old Ivy-based systems like old Spark from Apache Ivy 2.5.2's incompatibility by introducing a new `.ivy2.5.2` directory. - Apache Spark 4.0.0 will create this once and reuse this directory while all the other systems like old Sparks uses the old one, `.ivy2`. So, the behavior is the same with the case where Apache Spark 4.0.0 is installed and used in a new machine. - For the environments with `User-provided Ivy-path`es, the user might hit the incompatibility still. However, the users can mitigate them because they already have full control on `Ivy-path`es. This was tried once and reverted logically due to Java 11 and Java 17 failures in Daily CIs. - apache#42613 - apache#42668 Currently, PR Builder also fails as of now. If the PR passes CIes, we can achieve the following. - [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) - FIX: CVE-2022-46751: Apache Ivy Is Vulnerable to XML External Entity Injections No. Pass the CIs including `HiveExternalCatalogVersionsSuite`. No. Closes apache#45075 from dongjoon-hyun/SPARK-44914. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 3baa60a) [SPARK-44968][BUILD] Downgrade ivy from 2.5.2 to 2.5.1 ### What changes were proposed in this pull request? After upgrading Ivy from 2.5.1 to 2.5.2 in SPARK-44914, daily tests for Java 11 and Java 17 began to experience ABORTED in the `HiveExternalCatalogVersionsSuite` test. Java 11 - https://github.com/apache/spark/actions/runs/5953716283/job/16148657660 - https://github.com/apache/spark/actions/runs/5966131923/job/16185159550 Java 17 - https://github.com/apache/spark/actions/runs/5956925790/job/16158714165 - https://github.com/apache/spark/actions/runs/5969348559/job/16195073478 ``` 2023-08-23T23:00:49.6547573Z [info] 2023-08-23 16:00:48.209 - stdout> : java.lang.RuntimeException: problem during retrieve of org.apache.spark#spark-submit-parent-4c061f04-b951-4d06-8909-cde5452988d9: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6548745Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:238) 2023-08-23T23:00:49.6549572Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:89) 2023-08-23T23:00:49.6550334Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.Ivy.retrieve(Ivy.java:551) 2023-08-23T23:00:49.6551079Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1464) 2023-08-23T23:00:49.6552024Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.$anonfun$downloadVersion$2(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6552884Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42) 2023-08-23T23:00:49.6553755Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.downloadVersion(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6554705Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.liftedTree1$1(IsolatedClientLoader.scala:65) 2023-08-23T23:00:49.6555637Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.forVersion(IsolatedClientLoader.scala:64) 2023-08-23T23:00:49.6556554Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:443) 2023-08-23T23:00:49.6557340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:356) 2023-08-23T23:00:49.6558187Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71) 2023-08-23T23:00:49.6559061Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70) 2023-08-23T23:00:49.6559962Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6560766Z [info] 2023-08-23 16:00:48.209 - stdout> at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) 2023-08-23T23:00:49.6561584Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102) 2023-08-23T23:00:49.6562510Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6563435Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) 2023-08-23T23:00:49.6564323Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) 2023-08-23T23:00:49.6565340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) 2023-08-23T23:00:49.6566321Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) 2023-08-23T23:00:49.6567363Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) 2023-08-23T23:00:49.6568372Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) 2023-08-23T23:00:49.6569393Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:490) 2023-08-23T23:00:49.6570685Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:155) 2023-08-23T23:00:49.6571842Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113) 2023-08-23T23:00:49.6572932Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111) 2023-08-23T23:00:49.6573996Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125) 2023-08-23T23:00:49.6575045Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97) 2023-08-23T23:00:49.6576066Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) 2023-08-23T23:00:49.6576937Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) 2023-08-23T23:00:49.6577807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) 2023-08-23T23:00:49.6578620Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6579432Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) 2023-08-23T23:00:49.6580357Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97) 2023-08-23T23:00:49.6581331Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93) 2023-08-23T23:00:49.6582239Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) 2023-08-23T23:00:49.6583101Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) 2023-08-23T23:00:49.6584088Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) 2023-08-23T23:00:49.6585236Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6586519Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) 2023-08-23T23:00:49.6587686Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) 2023-08-23T23:00:49.6588898Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590014Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590993Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457) 2023-08-23T23:00:49.6591930Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:93) 2023-08-23T23:00:49.6592914Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:80) 2023-08-23T23:00:49.6593856Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:78) 2023-08-23T23:00:49.6594687Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219) 2023-08-23T23:00:49.6595379Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) 2023-08-23T23:00:49.6596103Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6596807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96) 2023-08-23T23:00:49.6597520Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618) 2023-08-23T23:00:49.6598276Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6599022Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) 2023-08-23T23:00:49.6599819Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2023-08-23T23:00:49.6600723Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) 2023-08-23T23:00:49.6601707Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2023-08-23T23:00:49.6602513Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.reflect.Method.invoke(Method.java:568) 2023-08-23T23:00:49.6603272Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) 2023-08-23T23:00:49.6604007Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 2023-08-23T23:00:49.6604724Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.Gateway.invoke(Gateway.java:282) 2023-08-23T23:00:49.6605416Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) 2023-08-23T23:00:49.6606209Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.CallCommand.execute(CallCommand.java:79) 2023-08-23T23:00:49.6606969Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) 2023-08-23T23:00:49.6607743Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.run(ClientServerConnection.java:106) 2023-08-23T23:00:49.6608415Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.Thread.run(Thread.java:833) 2023-08-23T23:00:49.6609288Z [info] 2023-08-23 16:00:48.209 - stdout> Caused by: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6610288Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.determineArtifactsToCopy(RetrieveEngine.java:426) 2023-08-23T23:00:49.6611332Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:122) 2023-08-23T23:00:49.6612046Z [info] 2023-08-23 16:00:48.209 - stdout> ... 66 more 2023-08-23T23:00:49.6612498Z [info] 2023-08-23 16:00:48.209 - stdout> ``` So this pr downgrade ivy from 2.5.2 to 2.5.1 to restore Java 11/17 daily tests. ### Why are the changes needed? To restore Java 11/17 daily tests. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By changing the default Java version in `build_and_test.yml` to 17 for verification, the tests succeed after downgrading the Ivy to 2.5.1. - https://github.com/LuciferYang/spark/actions/runs/5972232677/job/16209970934 <img width="1116" alt="image" src="https://github.com/apache/spark/assets/1475305/cd4002d8-893d-4845-8b2e-c01ff3106f7f"> ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#42668 from LuciferYang/test-java17. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 4f8a199) [SPARK-44914][BUILD] Upgrade `Apache ivy` from 2.5.1 to 2.5.2 Upgrade Apache ivy from 2.5.1 to 2.5.2 [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) [CVE-2022-46751](https://www.cve.org/CVERecord?id=CVE-2022-46751) The fix apache/ant-ivy@2be17bc No. Pass GA No. Closes apache#42613 from bjornjorgensen/ivy-2.5.2. Authored-by: Bjørn Jørgensen <bjornjorgensen@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 611e17e) [SPARK-41030][BUILD] Upgrade `Apache Ivy` to 2.5.1 Upgrade `Apache Ivy` from 2.5.0 to 2.5.1 [Release notes](https://ant.apache.org/ivy/history/2.5.1/release-notes.html) [CVE-2022-37865](https://www.cve.org/CVERecord?id=CVE-2022-37865) and [CVE-2022-37866](https://nvd.nist.gov/vuln/detail/CVE-2022-37866) No. Pass GA Closes apache#38539 from bjornjorgensen/ivy-2.5.1. Authored-by: Bjørn <bjornjorgensen@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 4bbdca6) (cherry picked from commit 0e5fa79) # Conflicts: # dev/deps/spark-deps-hadoop-2-hive-2.3 # dev/deps/spark-deps-hadoop-3-hive-2.3 # docs/core-migration-guide.md # pom.xml * ODP-1303 [SPARK-45732][BUILD] Upgrade commons-text to 1.11.0 The pr aims to upgrade `commons-text` from `1.10.0` to `1.11.0`. Release note: https://commons.apache.org/proper/commons-text/changes-report.html#a1.11.0 includes some bug fix, eg: - Fix StringTokenizer.getTokenList to return an independent modifiable list. Fixes [TEXT-219](https://issues.apache.org/jira/browse/TEXT-219). - Fix TextStringBuilder to over-allocate when ensuring capacity apache#452. Fixes [TEXT-228](https://issues.apache.org/jira/browse/TEXT-228). - TextStringBuidler#hashCode() allocates a String on each call apache#387. No. Pass GA. No. Closes apache#43590 from panbingkun/SPARK-45732. Authored-by: panbingkun <pbk1982@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit d38f074) [SPARK-40801][BUILD] Upgrade `Apache commons-text` to 1.10 Upgrade Apache commons-text from 1.9 to 1.10.0 [CVE-2022-42889](https://nvd.nist.gov/vuln/detail/CVE-2022-42889) No. Pass github action Closes apache#38262 from bjornjorgensen/commons-text-1.10. Authored-by: Bjørn <bjornjorgensen@gmail.com> Signed-off-by: Yuming Wang <yumwang@ebay.com> (cherry picked from commit 99abc94) [SPARK-38231][BUILD] Upgrade commons-text to 1.9 This PR aims to upgrade commons-text to 1.9. 1.9 is the latest and popular than 1.6. - https://commons.apache.org/proper/commons-text/changes-report.html#a1.9 - https://mvnrepository.com/artifact/org.apache.commons/commons-text No Pass GA Closes apache#35542 from LuciferYang/upgrade-common-text. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 70f5bfd) (cherry picked from commit 5cb61e7) # Conflicts: # pom.xml * ODP-1302 [SPARK-43225][BUILD][SQL] Remove jackson-core-asl and jackson-mapper-asl from pre-built distribution - Remove `jackson-core-asl` from maven dependency. - Change the scope of `jackson-mapper-asl` from compile to test. - Replace all `Hive.get(conf)` with `Hive.getWithoutRegisterFns(conf)`. To fix CVE issue: https://github.com/apache/spark/security/dependabot/50. No. manual test. Closes apache#40893 from wangyum/SPARK-43225. Lead-authored-by: Yuming Wang <wgyumg@gmail.com> Co-authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <srowen@gmail.com> (cherry picked from commit 9c237d7) [SPARK-43868][SQL][TESTS] Remove `originalUDFs` from `TestHive` to ensure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action This pr remove `originalUDFs` from `TestHive` to ensure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action. After SPARK-43225, `org.codehaus.jackson:jackson-mapper-asl` becomes a test scope dependency, so when using GA to run benchmark, it is not in the classpath because GA uses https://github.com/apache/spark/blob/d61c77cac17029ee27319e6b766b48d314a4dd31/.github/workflows/benchmark.yml#L179-L183 iunstead of the sbt `Test/runMain`. `ObjectHashAggregateExecBenchmark` used `TestHive`, and `TestHive` will always call `org.apache.hadoop.hive.ql.exec.FunctionRegistry#getFunctionNames` to init `originalUDFs` before this pr, so when we run `ObjectHashAggregateExecBenchmark` on GitHub Actions, there will be the following exceptions: (cherry picked from commit 1c10e28) # Conflicts: # pom.xml --------- Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Co-authored-by: Yuming Wang <wgyumg@gmail.com>
This PR aims to upgrade Apache Ivy to 2.5.2 and protect old Ivy-based systems like old Spark from Apache Ivy 2.5.2's incompatibility by introducing a new `.ivy2.5.2` directory. - Apache Spark 4.0.0 will create this once and reuse this directory while all the other systems like old Sparks uses the old one, `.ivy2`. So, the behavior is the same with the case where Apache Spark 4.0.0 is installed and used in a new machine. - For the environments with `User-provided Ivy-path`es, the user might hit the incompatibility still. However, the users can mitigate them because they already have full control on `Ivy-path`es. This was tried once and reverted logically due to Java 11 and Java 17 failures in Daily CIs. - apache#42613 - apache#42668 Currently, PR Builder also fails as of now. If the PR passes CIes, we can achieve the following. - [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) - FIX: CVE-2022-46751: Apache Ivy Is Vulnerable to XML External Entity Injections No. Pass the CIs including `HiveExternalCatalogVersionsSuite`. No. Closes apache#45075 from dongjoon-hyun/SPARK-44914. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 3baa60a) [SPARK-44968][BUILD] Downgrade ivy from 2.5.2 to 2.5.1 ### What changes were proposed in this pull request? After upgrading Ivy from 2.5.1 to 2.5.2 in SPARK-44914, daily tests for Java 11 and Java 17 began to experience ABORTED in the `HiveExternalCatalogVersionsSuite` test. Java 11 - https://github.com/apache/spark/actions/runs/5953716283/job/16148657660 - https://github.com/apache/spark/actions/runs/5966131923/job/16185159550 Java 17 - https://github.com/apache/spark/actions/runs/5956925790/job/16158714165 - https://github.com/apache/spark/actions/runs/5969348559/job/16195073478 ``` 2023-08-23T23:00:49.6547573Z [info] 2023-08-23 16:00:48.209 - stdout> : java.lang.RuntimeException: problem during retrieve of org.apache.spark#spark-submit-parent-4c061f04-b951-4d06-8909-cde5452988d9: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6548745Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:238) 2023-08-23T23:00:49.6549572Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:89) 2023-08-23T23:00:49.6550334Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.Ivy.retrieve(Ivy.java:551) 2023-08-23T23:00:49.6551079Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1464) 2023-08-23T23:00:49.6552024Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.$anonfun$downloadVersion$2(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6552884Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42) 2023-08-23T23:00:49.6553755Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.downloadVersion(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6554705Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.liftedTree1$1(IsolatedClientLoader.scala:65) 2023-08-23T23:00:49.6555637Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.forVersion(IsolatedClientLoader.scala:64) 2023-08-23T23:00:49.6556554Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:443) 2023-08-23T23:00:49.6557340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:356) 2023-08-23T23:00:49.6558187Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71) 2023-08-23T23:00:49.6559061Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70) 2023-08-23T23:00:49.6559962Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6560766Z [info] 2023-08-23 16:00:48.209 - stdout> at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) 2023-08-23T23:00:49.6561584Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102) 2023-08-23T23:00:49.6562510Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6563435Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) 2023-08-23T23:00:49.6564323Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) 2023-08-23T23:00:49.6565340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) 2023-08-23T23:00:49.6566321Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) 2023-08-23T23:00:49.6567363Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) 2023-08-23T23:00:49.6568372Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) 2023-08-23T23:00:49.6569393Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:490) 2023-08-23T23:00:49.6570685Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:155) 2023-08-23T23:00:49.6571842Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113) 2023-08-23T23:00:49.6572932Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111) 2023-08-23T23:00:49.6573996Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125) 2023-08-23T23:00:49.6575045Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97) 2023-08-23T23:00:49.6576066Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) 2023-08-23T23:00:49.6576937Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) 2023-08-23T23:00:49.6577807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) 2023-08-23T23:00:49.6578620Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6579432Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) 2023-08-23T23:00:49.6580357Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97) 2023-08-23T23:00:49.6581331Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93) 2023-08-23T23:00:49.6582239Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) 2023-08-23T23:00:49.6583101Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) 2023-08-23T23:00:49.6584088Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) 2023-08-23T23:00:49.6585236Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6586519Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) 2023-08-23T23:00:49.6587686Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) 2023-08-23T23:00:49.6588898Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590014Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590993Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457) 2023-08-23T23:00:49.6591930Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:93) 2023-08-23T23:00:49.6592914Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:80) 2023-08-23T23:00:49.6593856Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:78) 2023-08-23T23:00:49.6594687Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219) 2023-08-23T23:00:49.6595379Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) 2023-08-23T23:00:49.6596103Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6596807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96) 2023-08-23T23:00:49.6597520Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618) 2023-08-23T23:00:49.6598276Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6599022Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) 2023-08-23T23:00:49.6599819Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2023-08-23T23:00:49.6600723Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) 2023-08-23T23:00:49.6601707Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2023-08-23T23:00:49.6602513Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.reflect.Method.invoke(Method.java:568) 2023-08-23T23:00:49.6603272Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) 2023-08-23T23:00:49.6604007Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 2023-08-23T23:00:49.6604724Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.Gateway.invoke(Gateway.java:282) 2023-08-23T23:00:49.6605416Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) 2023-08-23T23:00:49.6606209Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.CallCommand.execute(CallCommand.java:79) 2023-08-23T23:00:49.6606969Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) 2023-08-23T23:00:49.6607743Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.run(ClientServerConnection.java:106) 2023-08-23T23:00:49.6608415Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.Thread.run(Thread.java:833) 2023-08-23T23:00:49.6609288Z [info] 2023-08-23 16:00:48.209 - stdout> Caused by: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6610288Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.determineArtifactsToCopy(RetrieveEngine.java:426) 2023-08-23T23:00:49.6611332Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:122) 2023-08-23T23:00:49.6612046Z [info] 2023-08-23 16:00:48.209 - stdout> ... 66 more 2023-08-23T23:00:49.6612498Z [info] 2023-08-23 16:00:48.209 - stdout> ``` So this pr downgrade ivy from 2.5.2 to 2.5.1 to restore Java 11/17 daily tests. ### Why are the changes needed? To restore Java 11/17 daily tests. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By changing the default Java version in `build_and_test.yml` to 17 for verification, the tests succeed after downgrading the Ivy to 2.5.1. - https://github.com/LuciferYang/spark/actions/runs/5972232677/job/16209970934 <img width="1116" alt="image" src="https://github.com/apache/spark/assets/1475305/cd4002d8-893d-4845-8b2e-c01ff3106f7f"> ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#42668 from LuciferYang/test-java17. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 4f8a199) [SPARK-44914][BUILD] Upgrade `Apache ivy` from 2.5.1 to 2.5.2 Upgrade Apache ivy from 2.5.1 to 2.5.2 [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) [CVE-2022-46751](https://www.cve.org/CVERecord?id=CVE-2022-46751) The fix apache/ant-ivy@2be17bc No. Pass GA No. Closes apache#42613 from bjornjorgensen/ivy-2.5.2. Authored-by: Bjørn Jørgensen <bjornjorgensen@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 611e17e) [SPARK-41030][BUILD] Upgrade `Apache Ivy` to 2.5.1 Upgrade `Apache Ivy` from 2.5.0 to 2.5.1 [Release notes](https://ant.apache.org/ivy/history/2.5.1/release-notes.html) [CVE-2022-37865](https://www.cve.org/CVERecord?id=CVE-2022-37865) and [CVE-2022-37866](https://nvd.nist.gov/vuln/detail/CVE-2022-37866) No. Pass GA Closes apache#38539 from bjornjorgensen/ivy-2.5.1. Authored-by: Bjørn <bjornjorgensen@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 4bbdca6) (cherry picked from commit 0e5fa79) # Conflicts: # dev/deps/spark-deps-hadoop-2-hive-2.3 # dev/deps/spark-deps-hadoop-3-hive-2.3 # docs/core-migration-guide.md # pom.xml (cherry picked from commit 222356d)
* ODP-1304 [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 This PR aims to upgrade Apache Ivy to 2.5.2 and protect old Ivy-based systems like old Spark from Apache Ivy 2.5.2's incompatibility by introducing a new `.ivy2.5.2` directory. - Apache Spark 4.0.0 will create this once and reuse this directory while all the other systems like old Sparks uses the old one, `.ivy2`. So, the behavior is the same with the case where Apache Spark 4.0.0 is installed and used in a new machine. - For the environments with `User-provided Ivy-path`es, the user might hit the incompatibility still. However, the users can mitigate them because they already have full control on `Ivy-path`es. This was tried once and reverted logically due to Java 11 and Java 17 failures in Daily CIs. - apache#42613 - apache#42668 Currently, PR Builder also fails as of now. If the PR passes CIes, we can achieve the following. - [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) - FIX: CVE-2022-46751: Apache Ivy Is Vulnerable to XML External Entity Injections No. Pass the CIs including `HiveExternalCatalogVersionsSuite`. No. Closes apache#45075 from dongjoon-hyun/SPARK-44914. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 3baa60a) [SPARK-44968][BUILD] Downgrade ivy from 2.5.2 to 2.5.1 ### What changes were proposed in this pull request? After upgrading Ivy from 2.5.1 to 2.5.2 in SPARK-44914, daily tests for Java 11 and Java 17 began to experience ABORTED in the `HiveExternalCatalogVersionsSuite` test. Java 11 - https://github.com/apache/spark/actions/runs/5953716283/job/16148657660 - https://github.com/apache/spark/actions/runs/5966131923/job/16185159550 Java 17 - https://github.com/apache/spark/actions/runs/5956925790/job/16158714165 - https://github.com/apache/spark/actions/runs/5969348559/job/16195073478 ``` 2023-08-23T23:00:49.6547573Z [info] 2023-08-23 16:00:48.209 - stdout> : java.lang.RuntimeException: problem during retrieve of org.apache.spark#spark-submit-parent-4c061f04-b951-4d06-8909-cde5452988d9: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6548745Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:238) 2023-08-23T23:00:49.6549572Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:89) 2023-08-23T23:00:49.6550334Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.Ivy.retrieve(Ivy.java:551) 2023-08-23T23:00:49.6551079Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1464) 2023-08-23T23:00:49.6552024Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.$anonfun$downloadVersion$2(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6552884Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42) 2023-08-23T23:00:49.6553755Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.downloadVersion(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6554705Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.liftedTree1$1(IsolatedClientLoader.scala:65) 2023-08-23T23:00:49.6555637Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.forVersion(IsolatedClientLoader.scala:64) 2023-08-23T23:00:49.6556554Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:443) 2023-08-23T23:00:49.6557340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:356) 2023-08-23T23:00:49.6558187Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71) 2023-08-23T23:00:49.6559061Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70) 2023-08-23T23:00:49.6559962Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6560766Z [info] 2023-08-23 16:00:48.209 - stdout> at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) 2023-08-23T23:00:49.6561584Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102) 2023-08-23T23:00:49.6562510Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6563435Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) 2023-08-23T23:00:49.6564323Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) 2023-08-23T23:00:49.6565340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) 2023-08-23T23:00:49.6566321Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) 2023-08-23T23:00:49.6567363Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) 2023-08-23T23:00:49.6568372Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) 2023-08-23T23:00:49.6569393Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:490) 2023-08-23T23:00:49.6570685Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:155) 2023-08-23T23:00:49.6571842Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113) 2023-08-23T23:00:49.6572932Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111) 2023-08-23T23:00:49.6573996Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125) 2023-08-23T23:00:49.6575045Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97) 2023-08-23T23:00:49.6576066Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) 2023-08-23T23:00:49.6576937Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) 2023-08-23T23:00:49.6577807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) 2023-08-23T23:00:49.6578620Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6579432Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) 2023-08-23T23:00:49.6580357Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97) 2023-08-23T23:00:49.6581331Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93) 2023-08-23T23:00:49.6582239Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) 2023-08-23T23:00:49.6583101Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) 2023-08-23T23:00:49.6584088Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) 2023-08-23T23:00:49.6585236Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6586519Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) 2023-08-23T23:00:49.6587686Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) 2023-08-23T23:00:49.6588898Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590014Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590993Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457) 2023-08-23T23:00:49.6591930Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:93) 2023-08-23T23:00:49.6592914Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:80) 2023-08-23T23:00:49.6593856Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:78) 2023-08-23T23:00:49.6594687Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219) 2023-08-23T23:00:49.6595379Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) 2023-08-23T23:00:49.6596103Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6596807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96) 2023-08-23T23:00:49.6597520Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618) 2023-08-23T23:00:49.6598276Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6599022Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) 2023-08-23T23:00:49.6599819Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2023-08-23T23:00:49.6600723Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) 2023-08-23T23:00:49.6601707Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2023-08-23T23:00:49.6602513Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.reflect.Method.invoke(Method.java:568) 2023-08-23T23:00:49.6603272Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) 2023-08-23T23:00:49.6604007Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 2023-08-23T23:00:49.6604724Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.Gateway.invoke(Gateway.java:282) 2023-08-23T23:00:49.6605416Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) 2023-08-23T23:00:49.6606209Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.CallCommand.execute(CallCommand.java:79) 2023-08-23T23:00:49.6606969Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) 2023-08-23T23:00:49.6607743Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.run(ClientServerConnection.java:106) 2023-08-23T23:00:49.6608415Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.Thread.run(Thread.java:833) 2023-08-23T23:00:49.6609288Z [info] 2023-08-23 16:00:48.209 - stdout> Caused by: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6610288Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.determineArtifactsToCopy(RetrieveEngine.java:426) 2023-08-23T23:00:49.6611332Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:122) 2023-08-23T23:00:49.6612046Z [info] 2023-08-23 16:00:48.209 - stdout> ... 66 more 2023-08-23T23:00:49.6612498Z [info] 2023-08-23 16:00:48.209 - stdout> ``` So this pr downgrade ivy from 2.5.2 to 2.5.1 to restore Java 11/17 daily tests. ### Why are the changes needed? To restore Java 11/17 daily tests. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By changing the default Java version in `build_and_test.yml` to 17 for verification, the tests succeed after downgrading the Ivy to 2.5.1. - https://github.com/LuciferYang/spark/actions/runs/5972232677/job/16209970934 <img width="1116" alt="image" src="https://github.com/apache/spark/assets/1475305/cd4002d8-893d-4845-8b2e-c01ff3106f7f"> ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#42668 from LuciferYang/test-java17. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 4f8a199) [SPARK-44914][BUILD] Upgrade `Apache ivy` from 2.5.1 to 2.5.2 Upgrade Apache ivy from 2.5.1 to 2.5.2 [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) [CVE-2022-46751](https://www.cve.org/CVERecord?id=CVE-2022-46751) The fix apache/ant-ivy@2be17bc No. Pass GA No. Closes apache#42613 from bjornjorgensen/ivy-2.5.2. Authored-by: Bjørn Jørgensen <bjornjorgensen@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 611e17e) [SPARK-41030][BUILD] Upgrade `Apache Ivy` to 2.5.1 Upgrade `Apache Ivy` from 2.5.0 to 2.5.1 [Release notes](https://ant.apache.org/ivy/history/2.5.1/release-notes.html) [CVE-2022-37865](https://www.cve.org/CVERecord?id=CVE-2022-37865) and [CVE-2022-37866](https://nvd.nist.gov/vuln/detail/CVE-2022-37866) No. Pass GA Closes apache#38539 from bjornjorgensen/ivy-2.5.1. Authored-by: Bjørn <bjornjorgensen@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 4bbdca6) (cherry picked from commit 0e5fa79) # Conflicts: # dev/deps/spark-deps-hadoop-2-hive-2.3 # dev/deps/spark-deps-hadoop-3-hive-2.3 # docs/core-migration-guide.md # pom.xml * ODP-1303 [SPARK-45732][BUILD] Upgrade commons-text to 1.11.0 The pr aims to upgrade `commons-text` from `1.10.0` to `1.11.0`. Release note: https://commons.apache.org/proper/commons-text/changes-report.html#a1.11.0 includes some bug fix, eg: - Fix StringTokenizer.getTokenList to return an independent modifiable list. Fixes [TEXT-219](https://issues.apache.org/jira/browse/TEXT-219). - Fix TextStringBuilder to over-allocate when ensuring capacity apache#452. Fixes [TEXT-228](https://issues.apache.org/jira/browse/TEXT-228). - TextStringBuidler#hashCode() allocates a String on each call apache#387. No. Pass GA. No. Closes apache#43590 from panbingkun/SPARK-45732. Authored-by: panbingkun <pbk1982@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit d38f074) [SPARK-40801][BUILD] Upgrade `Apache commons-text` to 1.10 Upgrade Apache commons-text from 1.9 to 1.10.0 [CVE-2022-42889](https://nvd.nist.gov/vuln/detail/CVE-2022-42889) No. Pass github action Closes apache#38262 from bjornjorgensen/commons-text-1.10. Authored-by: Bjørn <bjornjorgensen@gmail.com> Signed-off-by: Yuming Wang <yumwang@ebay.com> (cherry picked from commit 99abc94) [SPARK-38231][BUILD] Upgrade commons-text to 1.9 This PR aims to upgrade commons-text to 1.9. 1.9 is the latest and popular than 1.6. - https://commons.apache.org/proper/commons-text/changes-report.html#a1.9 - https://mvnrepository.com/artifact/org.apache.commons/commons-text No Pass GA Closes apache#35542 from LuciferYang/upgrade-common-text. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 70f5bfd) (cherry picked from commit 5cb61e7) # Conflicts: # pom.xml * ODP-1302 [SPARK-43225][BUILD][SQL] Remove jackson-core-asl and jackson-mapper-asl from pre-built distribution - Remove `jackson-core-asl` from maven dependency. - Change the scope of `jackson-mapper-asl` from compile to test. - Replace all `Hive.get(conf)` with `Hive.getWithoutRegisterFns(conf)`. To fix CVE issue: https://github.com/apache/spark/security/dependabot/50. No. manual test. Closes apache#40893 from wangyum/SPARK-43225. Lead-authored-by: Yuming Wang <wgyumg@gmail.com> Co-authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <srowen@gmail.com> (cherry picked from commit 9c237d7) [SPARK-43868][SQL][TESTS] Remove `originalUDFs` from `TestHive` to ensure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action This pr remove `originalUDFs` from `TestHive` to ensure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action. After SPARK-43225, `org.codehaus.jackson:jackson-mapper-asl` becomes a test scope dependency, so when using GA to run benchmark, it is not in the classpath because GA uses https://github.com/apache/spark/blob/d61c77cac17029ee27319e6b766b48d314a4dd31/.github/workflows/benchmark.yml#L179-L183 iunstead of the sbt `Test/runMain`. `ObjectHashAggregateExecBenchmark` used `TestHive`, and `TestHive` will always call `org.apache.hadoop.hive.ql.exec.FunctionRegistry#getFunctionNames` to init `originalUDFs` before this pr, so when we run `ObjectHashAggregateExecBenchmark` on GitHub Actions, there will be the following exceptions: (cherry picked from commit 1c10e28) # Conflicts: # pom.xml --------- Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Co-authored-by: Yuming Wang <wgyumg@gmail.com>
* ODP-1304 [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 This PR aims to upgrade Apache Ivy to 2.5.2 and protect old Ivy-based systems like old Spark from Apache Ivy 2.5.2's incompatibility by introducing a new `.ivy2.5.2` directory. - Apache Spark 4.0.0 will create this once and reuse this directory while all the other systems like old Sparks uses the old one, `.ivy2`. So, the behavior is the same with the case where Apache Spark 4.0.0 is installed and used in a new machine. - For the environments with `User-provided Ivy-path`es, the user might hit the incompatibility still. However, the users can mitigate them because they already have full control on `Ivy-path`es. This was tried once and reverted logically due to Java 11 and Java 17 failures in Daily CIs. - apache#42613 - apache#42668 Currently, PR Builder also fails as of now. If the PR passes CIes, we can achieve the following. - [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) - FIX: CVE-2022-46751: Apache Ivy Is Vulnerable to XML External Entity Injections No. Pass the CIs including `HiveExternalCatalogVersionsSuite`. No. Closes apache#45075 from dongjoon-hyun/SPARK-44914. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 3baa60a) [SPARK-44968][BUILD] Downgrade ivy from 2.5.2 to 2.5.1 ### What changes were proposed in this pull request? After upgrading Ivy from 2.5.1 to 2.5.2 in SPARK-44914, daily tests for Java 11 and Java 17 began to experience ABORTED in the `HiveExternalCatalogVersionsSuite` test. Java 11 - https://github.com/apache/spark/actions/runs/5953716283/job/16148657660 - https://github.com/apache/spark/actions/runs/5966131923/job/16185159550 Java 17 - https://github.com/apache/spark/actions/runs/5956925790/job/16158714165 - https://github.com/apache/spark/actions/runs/5969348559/job/16195073478 ``` 2023-08-23T23:00:49.6547573Z [info] 2023-08-23 16:00:48.209 - stdout> : java.lang.RuntimeException: problem during retrieve of org.apache.spark#spark-submit-parent-4c061f04-b951-4d06-8909-cde5452988d9: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6548745Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:238) 2023-08-23T23:00:49.6549572Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:89) 2023-08-23T23:00:49.6550334Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.Ivy.retrieve(Ivy.java:551) 2023-08-23T23:00:49.6551079Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1464) 2023-08-23T23:00:49.6552024Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.$anonfun$downloadVersion$2(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6552884Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42) 2023-08-23T23:00:49.6553755Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.downloadVersion(IsolatedClientLoader.scala:138) 2023-08-23T23:00:49.6554705Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.liftedTree1$1(IsolatedClientLoader.scala:65) 2023-08-23T23:00:49.6555637Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.client.IsolatedClientLoader$.forVersion(IsolatedClientLoader.scala:64) 2023-08-23T23:00:49.6556554Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:443) 2023-08-23T23:00:49.6557340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:356) 2023-08-23T23:00:49.6558187Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71) 2023-08-23T23:00:49.6559061Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70) 2023-08-23T23:00:49.6559962Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6560766Z [info] 2023-08-23 16:00:48.209 - stdout> at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) 2023-08-23T23:00:49.6561584Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102) 2023-08-23T23:00:49.6562510Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) 2023-08-23T23:00:49.6563435Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) 2023-08-23T23:00:49.6564323Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) 2023-08-23T23:00:49.6565340Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) 2023-08-23T23:00:49.6566321Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) 2023-08-23T23:00:49.6567363Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) 2023-08-23T23:00:49.6568372Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) 2023-08-23T23:00:49.6569393Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:490) 2023-08-23T23:00:49.6570685Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:155) 2023-08-23T23:00:49.6571842Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113) 2023-08-23T23:00:49.6572932Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111) 2023-08-23T23:00:49.6573996Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125) 2023-08-23T23:00:49.6575045Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97) 2023-08-23T23:00:49.6576066Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) 2023-08-23T23:00:49.6576937Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) 2023-08-23T23:00:49.6577807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) 2023-08-23T23:00:49.6578620Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6579432Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) 2023-08-23T23:00:49.6580357Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97) 2023-08-23T23:00:49.6581331Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93) 2023-08-23T23:00:49.6582239Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) 2023-08-23T23:00:49.6583101Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) 2023-08-23T23:00:49.6584088Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) 2023-08-23T23:00:49.6585236Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6586519Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) 2023-08-23T23:00:49.6587686Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) 2023-08-23T23:00:49.6588898Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590014Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 2023-08-23T23:00:49.6590993Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457) 2023-08-23T23:00:49.6591930Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:93) 2023-08-23T23:00:49.6592914Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:80) 2023-08-23T23:00:49.6593856Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:78) 2023-08-23T23:00:49.6594687Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219) 2023-08-23T23:00:49.6595379Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) 2023-08-23T23:00:49.6596103Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6596807Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96) 2023-08-23T23:00:49.6597520Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618) 2023-08-23T23:00:49.6598276Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) 2023-08-23T23:00:49.6599022Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) 2023-08-23T23:00:49.6599819Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2023-08-23T23:00:49.6600723Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) 2023-08-23T23:00:49.6601707Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2023-08-23T23:00:49.6602513Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.reflect.Method.invoke(Method.java:568) 2023-08-23T23:00:49.6603272Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) 2023-08-23T23:00:49.6604007Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 2023-08-23T23:00:49.6604724Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.Gateway.invoke(Gateway.java:282) 2023-08-23T23:00:49.6605416Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) 2023-08-23T23:00:49.6606209Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.commands.CallCommand.execute(CallCommand.java:79) 2023-08-23T23:00:49.6606969Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) 2023-08-23T23:00:49.6607743Z [info] 2023-08-23 16:00:48.209 - stdout> at py4j.ClientServerConnection.run(ClientServerConnection.java:106) 2023-08-23T23:00:49.6608415Z [info] 2023-08-23 16:00:48.209 - stdout> at java.base/java.lang.Thread.run(Thread.java:833) 2023-08-23T23:00:49.6609288Z [info] 2023-08-23 16:00:48.209 - stdout> Caused by: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error. 2023-08-23T23:00:49.6610288Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.determineArtifactsToCopy(RetrieveEngine.java:426) 2023-08-23T23:00:49.6611332Z [info] 2023-08-23 16:00:48.209 - stdout> at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:122) 2023-08-23T23:00:49.6612046Z [info] 2023-08-23 16:00:48.209 - stdout> ... 66 more 2023-08-23T23:00:49.6612498Z [info] 2023-08-23 16:00:48.209 - stdout> ``` So this pr downgrade ivy from 2.5.2 to 2.5.1 to restore Java 11/17 daily tests. ### Why are the changes needed? To restore Java 11/17 daily tests. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By changing the default Java version in `build_and_test.yml` to 17 for verification, the tests succeed after downgrading the Ivy to 2.5.1. - https://github.com/LuciferYang/spark/actions/runs/5972232677/job/16209970934 <img width="1116" alt="image" src="https://github.com/apache/spark/assets/1475305/cd4002d8-893d-4845-8b2e-c01ff3106f7f"> ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#42668 from LuciferYang/test-java17. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 4f8a199) [SPARK-44914][BUILD] Upgrade `Apache ivy` from 2.5.1 to 2.5.2 Upgrade Apache ivy from 2.5.1 to 2.5.2 [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr) [CVE-2022-46751](https://www.cve.org/CVERecord?id=CVE-2022-46751) The fix apache/ant-ivy@2be17bc No. Pass GA No. Closes apache#42613 from bjornjorgensen/ivy-2.5.2. Authored-by: Bjørn Jørgensen <bjornjorgensen@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 611e17e) [SPARK-41030][BUILD] Upgrade `Apache Ivy` to 2.5.1 Upgrade `Apache Ivy` from 2.5.0 to 2.5.1 [Release notes](https://ant.apache.org/ivy/history/2.5.1/release-notes.html) [CVE-2022-37865](https://www.cve.org/CVERecord?id=CVE-2022-37865) and [CVE-2022-37866](https://nvd.nist.gov/vuln/detail/CVE-2022-37866) No. Pass GA Closes apache#38539 from bjornjorgensen/ivy-2.5.1. Authored-by: Bjørn <bjornjorgensen@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 4bbdca6) (cherry picked from commit 0e5fa79) # Conflicts: # dev/deps/spark-deps-hadoop-2-hive-2.3 # dev/deps/spark-deps-hadoop-3-hive-2.3 # docs/core-migration-guide.md # pom.xml * ODP-1303 [SPARK-45732][BUILD] Upgrade commons-text to 1.11.0 The pr aims to upgrade `commons-text` from `1.10.0` to `1.11.0`. Release note: https://commons.apache.org/proper/commons-text/changes-report.html#a1.11.0 includes some bug fix, eg: - Fix StringTokenizer.getTokenList to return an independent modifiable list. Fixes [TEXT-219](https://issues.apache.org/jira/browse/TEXT-219). - Fix TextStringBuilder to over-allocate when ensuring capacity apache#452. Fixes [TEXT-228](https://issues.apache.org/jira/browse/TEXT-228). - TextStringBuidler#hashCode() allocates a String on each call apache#387. No. Pass GA. No. Closes apache#43590 from panbingkun/SPARK-45732. Authored-by: panbingkun <pbk1982@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit d38f074) [SPARK-40801][BUILD] Upgrade `Apache commons-text` to 1.10 Upgrade Apache commons-text from 1.9 to 1.10.0 [CVE-2022-42889](https://nvd.nist.gov/vuln/detail/CVE-2022-42889) No. Pass github action Closes apache#38262 from bjornjorgensen/commons-text-1.10. Authored-by: Bjørn <bjornjorgensen@gmail.com> Signed-off-by: Yuming Wang <yumwang@ebay.com> (cherry picked from commit 99abc94) [SPARK-38231][BUILD] Upgrade commons-text to 1.9 This PR aims to upgrade commons-text to 1.9. 1.9 is the latest and popular than 1.6. - https://commons.apache.org/proper/commons-text/changes-report.html#a1.9 - https://mvnrepository.com/artifact/org.apache.commons/commons-text No Pass GA Closes apache#35542 from LuciferYang/upgrade-common-text. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 70f5bfd) (cherry picked from commit 5cb61e7) # Conflicts: # pom.xml * ODP-1302 [SPARK-43225][BUILD][SQL] Remove jackson-core-asl and jackson-mapper-asl from pre-built distribution - Remove `jackson-core-asl` from maven dependency. - Change the scope of `jackson-mapper-asl` from compile to test. - Replace all `Hive.get(conf)` with `Hive.getWithoutRegisterFns(conf)`. To fix CVE issue: https://github.com/apache/spark/security/dependabot/50. No. manual test. Closes apache#40893 from wangyum/SPARK-43225. Lead-authored-by: Yuming Wang <wgyumg@gmail.com> Co-authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <srowen@gmail.com> (cherry picked from commit 9c237d7) [SPARK-43868][SQL][TESTS] Remove `originalUDFs` from `TestHive` to ensure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action This pr remove `originalUDFs` from `TestHive` to ensure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action. After SPARK-43225, `org.codehaus.jackson:jackson-mapper-asl` becomes a test scope dependency, so when using GA to run benchmark, it is not in the classpath because GA uses https://github.com/apache/spark/blob/d61c77cac17029ee27319e6b766b48d314a4dd31/.github/workflows/benchmark.yml#L179-L183 iunstead of the sbt `Test/runMain`. `ObjectHashAggregateExecBenchmark` used `TestHive`, and `TestHive` will always call `org.apache.hadoop.hive.ql.exec.FunctionRegistry#getFunctionNames` to init `originalUDFs` before this pr, so when we run `ObjectHashAggregateExecBenchmark` on GitHub Actions, there will be the following exceptions: (cherry picked from commit 1c10e28) # Conflicts: # pom.xml --------- Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Co-authored-by: Yuming Wang <wgyumg@gmail.com>
What changes were proposed in this pull request?
After upgrading Ivy from 2.5.1 to 2.5.2 in SPARK-44914, daily tests for Java 11 and Java 17 began to experience ABORTED in the
HiveExternalCatalogVersionsSuite
test.Java 11
Java 17
So this pr downgrade ivy from 2.5.2 to 2.5.1 to restore Java 11/17 daily tests.
Why are the changes needed?
To restore Java 11/17 daily tests.
Does this PR introduce any user-facing change?
No
How was this patch tested?
By changing the default Java version in
build_and_test.yml
to 17 for verification, the tests succeed after downgrading the Ivy to 2.5.1.Was this patch authored or co-authored using generative AI tooling?
No