Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark: Fix runtime jars packaging scala library files #5754

Merged
merged 2 commits into from
Sep 15, 2022

Conversation

ajantha-bhat
Copy link
Member

@ajantha-bhat ajantha-bhat commented Sep 13, 2022

#4009 adds a dependency on scala.collection.compat which is bringing the scala-library dependencies and causing the runtime jars to be packaged with scala-library files.

So far, two issues reported that scala files packaged with run time jar is conflicting with their environment scala files.

Applicable only to spark-3.3 and spark-3.2 as PR#4009 is present only in these versions. This change also reduces the runtime jar size by 4MB!

Fixes #5732

@ajantha-bhat
Copy link
Member Author

cc: @fqaiser94, @rdblue, @Fokko, @RussellSpitzer

@pan3793
Copy link
Member

pan3793 commented Sep 14, 2022

Have a quick check of spark runtime jar, httpclient5 is included but w/o proper relocation, could you please fix it as well?

@ajantha-bhat
Copy link
Member Author

ajantha-bhat commented Sep 14, 2022

Have a quick check of spark runtime jar, httpclient5 is included but w/o proper relocation, could you please fix it as well?

Will handle it in the follow-up PR as it affects all the engine runtime jars and usually I get comments to separate the independent issues.

@ajantha-bhat
Copy link
Member Author

Will handle it in the follow-up PR as it affects all the spark versions and usually I get comments to separate the independent issues.

@pan3793: Handled in the below PR, please review.
#5761

@ajantha-bhat ajantha-bhat marked this pull request as draft September 14, 2022 05:16
@ajantha-bhat ajantha-bhat marked this pull request as ready for review September 14, 2022 06:13
implementation project(":iceberg-spark:iceberg-spark-${sparkMajorVersion}_${scalaVersion}")
implementation project(":iceberg-spark:iceberg-spark-extensions-${sparkMajorVersion}_${scalaVersion}")
implementation(project(":iceberg-spark:iceberg-spark-${sparkMajorVersion}_${scalaVersion}")) {
exclude group: 'org.scala-lang', module: 'scala-library'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous comment may be unclear, my suggestion is to add the following part after line 190

      exclude group: 'org.scala-lang'
      exclude group: 'org.scala-lang.modules'

Copy link
Member Author

@ajantha-bhat ajantha-bhat Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me as spark is providing the scala.collection.compat also.

image

Updated the PR. Manually verified that there is no scala folder with the runtime jars. let's wait for the build.

@ajantha-bhat ajantha-bhat force-pushed the runtime branch 2 times, most recently from 4e6891b to 5e7f37b Compare September 14, 2022 08:10
@ajantha-bhat
Copy link
Member Author

@KarlManong also verified these changes as mentioned in #5732

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks @ajantha-bhat

After:

➜  iceberg git:(runtime) ls -lah spark/v3.3/spark-runtime/build/libs 
total 53904
drwxr-xr-x  6 fokkodriesprong  staff   192B Sep 14 15:37 .
drwxr-xr-x  8 fokkodriesprong  staff   256B Sep 14 15:37 ..
-rw-r--r--  1 fokkodriesprong  staff   5.9K Sep 14 15:37 iceberg-spark-runtime-3.3_2.12-0.15.0-SNAPSHOT-javadoc.jar
-rw-r--r--  1 fokkodriesprong  staff   5.9K Sep 14 15:37 iceberg-spark-runtime-3.3_2.12-0.15.0-SNAPSHOT-sources.jar
-rw-r--r--  1 fokkodriesprong  staff   5.9K Sep 14 15:37 iceberg-spark-runtime-3.3_2.12-0.15.0-SNAPSHOT-tests.jar
-rw-r--r--  1 fokkodriesprong  staff    25M Sep 14 15:37 iceberg-spark-runtime-3.3_2.12-0.15.0-SNAPSHOT.jar

Before:

➜  iceberg git:(master) ls -lah spark/v3.3/spark-runtime/build/libs
total 64496
drwxr-xr-x  6 fokkodriesprong  staff   192B Sep 14 15:39 .
drwxr-xr-x  8 fokkodriesprong  staff   256B Sep 14 15:40 ..
-rw-r--r--  1 fokkodriesprong  staff   5.9K Sep 14 15:39 iceberg-spark-runtime-3.3_2.12-0.15.0-SNAPSHOT-javadoc.jar
-rw-r--r--  1 fokkodriesprong  staff   5.9K Sep 14 15:39 iceberg-spark-runtime-3.3_2.12-0.15.0-SNAPSHOT-sources.jar
-rw-r--r--  1 fokkodriesprong  staff   5.9K Sep 14 15:39 iceberg-spark-runtime-3.3_2.12-0.15.0-SNAPSHOT-tests.jar
-rw-r--r--  1 fokkodriesprong  staff    31M Sep 14 15:40 iceberg-spark-runtime-3.3_2.12-0.15.0-SNAPSHOT.jar

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @ajantha-bhat

@Fokko Fokko merged commit 0dd66f1 into apache:master Sep 15, 2022
JonasJ-ap pushed a commit to JonasJ-ap/iceberg that referenced this pull request Sep 18, 2022
* Spark: Fix runtime jars packaging scala library files

* apply review comments
nastra pushed a commit to nastra/iceberg that referenced this pull request Sep 29, 2022
* Spark: Fix runtime jars packaging scala library files

* apply review comments
nastra pushed a commit to nastra/iceberg that referenced this pull request Sep 29, 2022
* Spark: Fix runtime jars packaging scala library files

* apply review comments
danielcweeks pushed a commit that referenced this pull request Sep 29, 2022
* Spark: Fix runtime jars packaging scala library files

* apply review comments

Co-authored-by: Ajantha Bhat <ajanthabhat@gmail.com>
sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 10, 2023
* Spark: Fix runtime jars packaging scala library files

* apply review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

iceberg's spark-runtime version 0.14 jar contains scala classes rather than 0.13 may cause ClassCastException
3 participants