-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre #29326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #126931 has finished for PR 29326 at commit
|
retest this please |
Test build #126935 has finished for PR 29326 at commit
|
Is this duplicated by #29325 ? |
It is a trouble that hive-exec uses a method that became package-private since Guava version 20. So there is incompatibility with Guava versions > 19.0.
hive-exec doesn't shade Guava until https://issues.apache.org/jira/browse/HIVE-22126 that targets 4.0.0. This seems a dead end for upgrading Guava in Spark for now. |
Opened https://issues.apache.org/jira/browse/HIVE-23980 and see if Hive people has some ideas. |
I did some tests. Few changes are required to pass the failed Hive tests:
But this just upgrades Guava version used in Spark. Hive dependencies still use older Guava with the reported CVE. |
Thank you for assessment, @viirya . Is there an official plan for Apache Spark 4.0.0 release? Actually, this is 3rd try after mine and @HyukjinKwon 's . So, I was curious about what is changed more until now. At that time, we dropped the old PRs because it's hard to expect to get shaded Apache Hive 2.3.8.
Apache Spark just migrated to Apache Hive 2.3. I don't think we can migrate to Apache Hive 4.0.0 in next one year. cc @gatorsmile |
BTW, shall we close for now? You can reopen this later when it's ready. |
@dongjoon-hyun Thanks for the comment. Yeah, it doesn't make sense to upgrade to Hive 4 in short or midterm. I'm working on upgrade Guava 27 and shading Guava in Hive too. I hope it can be part of Hive 2.3.8. I will close this for now. Once the work at Hive gets progress, I can reopen this. Thanks. |
Thank you so much. Yes. I'm looking forward to seeing that~ |
Hi Guys, I am having problems with Guava on Spark 3.0.0 and 3.0.1 with Hadoop 3.2.1 and Hive 3.12. I am using Spark Operator developed by Google, all seems to work fine except when I try to use Spark integrated with Hive Metastore. In this case I am facing the following error: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument I have tried several workarounds like replacing Guava, Spark spec on client "spark.executor.userClassPathFirst": "true" "spark.driver.userClassPathFirst": "true", shading Guava with maven-shade-plugin and unfortunately any of this alternatives are working properly. I hope you can be able to upgrade Guava soon in Spark. thanks. |
Thank you for your opinion, @danielradulov . However, it's Apache Hive issue across Apache Hadoop versions.
Apache Hadoop 3.2.1 has a breaking Guava dependency change which breaks most downstream project. IIRC, there is no official Apache Hive version to work on Apache Hadoop 3.2.1. You had better ask the support to Apache Hive community. Apache Spark community tried to upgrade to Apache Hadoop 3.2.1 (Sep. 2019) and gave up due to that. |
One question, in https://issues.apache.org/jira/browse/HADOOP-14284 it seems that Hadoop shades the Guava dependency, why do we introduce breaking changes when we upgrade to Hadoop 3.2.1 or Hadoop 3.3? |
Isn't HADOOP-14284 resolved as Invalid? |
@viirya you are right. My bad. |
try this again. |
retest this please |
Test build #140400 has started for PR 29326 at commit |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
27.0-jre
was released on October 2018. I'm wondering if we still need to use the same version from Hadoop. Since Apache Hadoop shaded its Guava dependency and Apache Spark doesn't use it, shall we try to use the latest one, 30.1.1-jre
, instead?
All newer Hadoop releases are going to be built with a later guava version, e.g. 27.0-jre, including Hadoop 3.1.3, 3.2.1, 3.3.0.
I'm not against to this point. I can change to latest guava and see what CI tells. |
Thanks. Ya, let's try with the latest one. |
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #140430 has finished for PR 29326 at commit
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #140434 has finished for PR 29326 at commit
|
This comment has been minimized.
This comment has been minimized.
retest this please |
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #140501 has finished for PR 29326 at commit
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
Hmm, from the failed tests below: org.apache.spark.sql.hive.DataSourceWithHiveMetastoreCatalogSuite Since Guava 20,
I seems figured it out. Verifying it locally... |
Encountered some issues. Although we can switch to hive-exec without classifier (shaded version) to get rid of above guava version issue, the shaded hive-exec contains (without relocation) some dependencies like commons-lang3, orc, parquet that are not same version with Spark and so they conflict. Because shaded hive-exec jar already includes these dependency jars, seems dependency exclusions in pom cannot exclude them. Currently seems we can just go back to Hive to shade every included dependencies? Any other thoughts? |
Oh I didn't even realize that Spark is using One idea is to have Spark use |
Yea, I'm afraid that it is true. If we want to completely isolate dependencies from Hive, we may need to relocate all included (but not relocated) dependencies in
Even Spark uses |
Hmm yea you are right, but shading the other dependencies will require another release though. Another thing we could try is to change |
Hmm, I looked at |
Yeah that looks right. It seems for the case when |
#33989 seems a promising direction. Close this. |
What changes were proposed in this pull request?
This PR upgrades Guava to newer 27.0-jre.
Why are the changes needed?
Guava 14.0.1 is pretty old and is among the affected Guava versions of CVE-2018-10237.
All newer Hadoop releases are going to be built with a later guava version, e.g. 27.0-jre, including Hadoop 3.1.3, 3.2.1, 3.3.0.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Pass the Jenkins tests.