Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flink: Support configurable Scala version. #4157

Merged
merged 1 commit into from
Feb 20, 2022

Conversation

openinx
Copy link
Member

@openinx openinx commented Feb 18, 2022

As apache flink usually will release a version with two scala version, but we apache iceberg only support scala 2.12 now. So here I publish a PR to support configurable scala version for building flink iceberg module.

We can use the following command to build flink runtime modules for scala 2.12:

./gradlew clean build -x test -DflinkVersions=1.12,1.13,1.14 \
  -DscalaVersion=2.12 \
  -DknownScalaVersions=2.12 \
  -x javadoc -Pquick \
  -DsparkVersions= -DhiveVersions=

Their packaged jar are looks like:

➜  iceberg git:(add-flink-scala-version) find  flink/ | grep jar | grep v1                                                                                                                             
flink//v1.12/flink/build/libs/iceberg-flink-1.12_2.11-0.13.0-SNAPSHOT-tests.jar
flink//v1.12/flink/build/libs/iceberg-flink-1.12_2.11-0.13.0-SNAPSHOT.jar
flink//v1.12/flink/build/libs/iceberg-flink-1.12_2.11-0.13.0-SNAPSHOT-javadoc.jar
flink//v1.12/flink/build/libs/iceberg-flink-1.12_2.11-0.13.0-SNAPSHOT-sources.jar
flink//v1.12/flink/build/tmp/jar
flink//v1.12/flink/build/tmp/jar/MANIFEST.MF
flink//v1.12/flink-runtime/build/libs/iceberg-flink-runtime-1.12_2.11-0.13.0-SNAPSHOT.jar
flink//v1.12/flink-runtime/build/libs/iceberg-flink-runtime-1.12_2.11-0.13.0-SNAPSHOT-javadoc.jar
flink//v1.12/flink-runtime/build/libs/iceberg-flink-runtime-1.12_2.11-0.13.0-SNAPSHOT-tests.jar
flink//v1.12/flink-runtime/build/libs/iceberg-flink-runtime-1.12_2.11-0.13.0-SNAPSHOT-sources.jar
flink//v1.14/flink/build/libs/iceberg-flink-1.14_2.11-0.13.0-SNAPSHOT-tests.jar
flink//v1.14/flink/build/libs/iceberg-flink-1.14_2.11-0.13.0-SNAPSHOT-javadoc.jar
flink//v1.14/flink/build/libs/iceberg-flink-1.14_2.11-0.13.0-SNAPSHOT-sources.jar
flink//v1.14/flink/build/libs/iceberg-flink-1.14_2.11-0.13.0-SNAPSHOT.jar
flink//v1.14/flink/build/tmp/jar
flink//v1.14/flink/build/tmp/jar/MANIFEST.MF
flink//v1.14/flink-runtime/build/libs/iceberg-flink-runtime-1.14_2.11-0.13.0-SNAPSHOT-tests.jar
flink//v1.14/flink-runtime/build/libs/iceberg-flink-runtime-1.14_2.11-0.13.0-SNAPSHOT-javadoc.jar
flink//v1.14/flink-runtime/build/libs/iceberg-flink-runtime-1.14_2.11-0.13.0-SNAPSHOT-sources.jar
flink//v1.14/flink-runtime/build/libs/iceberg-flink-runtime-1.14_2.11-0.13.0-SNAPSHOT.jar
flink//v1.13/flink/build/libs/iceberg-flink-1.13_2.11-0.13.0-SNAPSHOT-sources.jar
flink//v1.13/flink/build/libs/iceberg-flink-1.13_2.11-0.13.0-SNAPSHOT-javadoc.jar
flink//v1.13/flink/build/libs/iceberg-flink-1.13_2.11-0.13.0-SNAPSHOT-tests.jar
flink//v1.13/flink/build/libs/iceberg-flink-1.13_2.11-0.13.0-SNAPSHOT.jar
flink//v1.13/flink/build/tmp/jar
flink//v1.13/flink/build/tmp/jar/MANIFEST.MF
flink//v1.13/flink-runtime/build/libs/iceberg-flink-runtime-1.13_2.11-0.13.0-SNAPSHOT-tests.jar
flink//v1.13/flink-runtime/build/libs/iceberg-flink-runtime-1.13_2.11-0.13.0-SNAPSHOT.jar
flink//v1.13/flink-runtime/build/libs/iceberg-flink-runtime-1.13_2.11-0.13.0-SNAPSHOT-sources.jar
flink//v1.13/flink-runtime/build/libs/iceberg-flink-runtime-1.13_2.11-0.13.0-SNAPSHOT-javadoc.jar

And use the following command to build flink runtime modules for scala 2.11:

./gradlew clean build -x test -DflinkVersions=1.12,1.13,1.14 \
  -DscalaVersion=2.11 \
  -DknownScalaVersions=2.11 \
  -x javadoc -Pquick \
  -DsparkVersions= -DhiveVersions=

The build jars are similar to the above context.

project(":iceberg-flink:flink-1.14_${scalaVersion}").projectDir = file('flink/v1.14/flink')
project(":iceberg-flink:flink-1.14_${scalaVersion}").name = "iceberg-flink-1.14_${scalaVersion}"
project(":iceberg-flink:flink-runtime-1.14_${scalaVersion}").projectDir = file('flink/v1.14/flink-runtime')
project(":iceberg-flink:flink-runtime-1.14_${scalaVersion}").name = "iceberg-flink-runtime-1.14_${scalaVersion}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I’m wrong, but I think we need to be able to publish / build all combinations of Flink / Scala versions here. For example, when publishing a new release to maven we need to build all of the artifacts.

Is it possible to do that in a way that doesn’t require two build calls (one for each Scala version)? I’m not a highly experienced Scala with gradle user, so I might be mistaken or it might be necessary to use two build calls.

But it makes the release process more cumbersome and particular to have to call build multiple times, which is something we’ve been trying to make more streamlined (the build / release process).

I’m ok with iterating on things and would not block on this, but it is a concern for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the Spark on Scala 2.13 support PR, it looks like scalaVersion flag with two build calls is what is used.

I’m not sure if the release scripts have been updated to support this yet though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to do that in a way that doesn’t require two build calls (one for each Scala version)?

As far as I know, the gradle command cannot deploy the projects with two different values for a given system property. I mean I think the two build calls to build two separate flink scala runtime jar is the thing we need to do. I see we usually deploy the artifacts to maven repository by executing this stage-binaries.sh. I think it's okay to add the two gradle build calls into this bash script.

(In general, I think it's a good question)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can't have multiple Scala versions in the build without strange errors. We'll have to update the binary release script to publish Scala 2.13 modules.

}
String flinkVersion = '1.14.0'
String flinkMajorVersion = '1.14'
String scalaVersion = System.getProperty("scalaVersion") != null ? System.getProperty("scalaVersion") : System.getProperty("defaultScalaVersion")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We’ve recently also had work for supporting Scala 2.13 in Spark, so it might be wise to name scalaVersion to something more Flink related to keep the two separated. Though this isn’t a list of all known Scala versions for Flink so might not be a concern here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, Spark is also using scalaVersion flag. So we’ll definitely need to define explicit known Flink Scala version combos to disallow say, somebody trying to build Flink with Scala 2.13 (as the name will probably be shared).

Very interested in people’s thoughts on how to handle this concern of scalaVersion and Flink / Spark versions and their different supported combinations.

Copy link
Member Author

@openinx openinx Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I also considered this issue before. The current available scala version for flink are 2.11 and 2.12, for most of people as far I know, they mainly choose scala 2.11 for the production environment. But the available scala versions for spark are 2.11, 2.12, 2.13 ( Some major version only support 2.12).

So in fact, the supported scala version ranges are quite different between spark & flink. Besides, the supported scala version ranges are also different between different spark versions. Let's see the following table:

So even if we separate the scala version between spark & flink, the same scala version still could not be applied to build all the iceberg spark runtime jars I think. The correct approach is: try to package related artifacts in one build process as much as possible. If we can't build in one build, we can only choose to build again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there can only be one Scala version in the build, I think using the same scalaVersion flag should be fine.

Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I think we’re missing here is the ability to distinguish which Flink versions and which Scala versions go together. Presently, since you’re adding support for Scala 2.11 which is supported across all, it’s not a major concern. But eventually, when Flink supports Scala 2.13, I think it will be a concern. This might be ok to delay for later but worth mentioning.

The other thing I’ll mention is that we added the hard coded scala version _2.12 to Spark builds first, separately. I can’t say how much easier that made the process (I did that part but not the 2.13 upgrade), but I thought I’d share my experience.

Do you know if the Flink community is looking to adopt Scala 2.13 within the next few releases or if there’s any discussion around that?

Thanks @openinx!

Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here’s the recent PR for supporting Scala 2.13 in Spark 3.2:

#4009

kbendick added a commit to kbendick/iceberg that referenced this pull request Feb 18, 2022
Noticed we added support for Scala 2.13 with Spark 3.2 recently, but the build scripts have not been updated to reflect that.

This came up as a result of this discussion: apache#4157 (comment). There's possibly more work to do
@kbendick
Copy link
Contributor

@openinx I created a draft PR to update the release script (just for the Scala 2.13 with Spark support).

I probably should have just opened a ticket. Please feel free to close that one if you update things related to the release (otherwise we'll be sure to update before Iceberg 0.14.0).

@rdblue rdblue merged commit 1ce347d into apache:master Feb 20, 2022
@rdblue
Copy link
Contributor

rdblue commented Feb 20, 2022

Thanks, @openinx! Good to have this ready for when Flink supports 2.13.

kbendick added a commit to kbendick/iceberg that referenced this pull request Feb 21, 2022
Noticed we added support for Scala 2.13 with Spark 3.2 recently, but the build scripts have not been updated to reflect that.

This came up as a result of this discussion: apache#4157 (comment). There's possibly more work to do
kbendick added a commit to kbendick/iceberg that referenced this pull request Feb 21, 2022
Noticed we added support for Scala 2.13 with Spark 3.2 recently, but the build scripts have not been updated to reflect that.

This came up as a result of this discussion: apache#4157 (comment). There's possibly more work to do
kbendick added a commit to kbendick/iceberg that referenced this pull request Feb 21, 2022
Noticed we added support for Scala 2.13 with Spark 3.2 recently, but the build scripts have not been updated to reflect that.

This came up as a result of this discussion: apache#4157 (comment). There's possibly more work to do

project(':iceberg-flink:iceberg-flink-1.12') {
project(":iceberg-flink:iceberg-flink-${flinkMajorVersion}_${scalaVersion}") {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue @kbendick Seems I made a mistake here ? Since we apache iceberg flink & flink-runtime modules don't introduce any scala dependencies, so in theory the iceberg-flink-1.14.jar & iceberg-flink-runtime-1.14.jar can work fine on both flink-1.14.0-bin-scala_2.11 & flink-1.14.0-bin-scala_2.12 release. So we don't need to add any _${scalaVersion} suffix for iceberg-flink & iceberg-flink-runtime jar.

(But we still need this PR to make iceberg-flink/iceber-flink-runtime compile on different scala version ).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

➜ iceberg git:(master) ✗ ./gradlew :iceberg-flink:iceberg-flink-runtime-1.14_2.12:dependencies --configuration runtimeClasspath

> Task :iceberg-flink:iceberg-flink-runtime-1.14_2.12:dependencies

------------------------------------------------------------
Project ':iceberg-flink:iceberg-flink-runtime-1.14_2.12'
------------------------------------------------------------

runtimeClasspath - Runtime classpath of source set 'main'.
+--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
+--- project :iceberg-flink:iceberg-flink-1.14_2.12
|    +--- project :iceberg-api
|    |    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    |    \--- project :iceberg-bundled-guava
|    +--- project :iceberg-data
|    |    +--- project :iceberg-api (*)
|    |    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    |    +--- project :iceberg-bundled-guava
|    |    +--- project :iceberg-core
|    |    |    +--- project :iceberg-api (*)
|    |    |    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    |    |    +--- project :iceberg-common
|    |    |    |    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    |    |    |    \--- project :iceberg-bundled-guava
|    |    |    +--- project :iceberg-bundled-guava
|    |    |    +--- org.apache.avro:avro -> 1.10.1
|    |    |    |    +--- com.fasterxml.jackson.core:jackson-core:2.11.3 -> 2.11.4
|    |    |    |    \--- com.fasterxml.jackson.core:jackson-databind:2.11.3 -> 2.11.4
|    |    |    |         +--- com.fasterxml.jackson.core:jackson-annotations:2.11.4
|    |    |    |         \--- com.fasterxml.jackson.core:jackson-core:2.11.4
|    |    |    +--- com.fasterxml.jackson.core:jackson-databind -> 2.11.4 (*)
|    |    |    +--- com.fasterxml.jackson.core:jackson-core -> 2.11.4
|    |    |    +--- com.github.ben-manes.caffeine:caffeine -> 2.8.4
|    |    |    |    +--- org.checkerframework:checker-qual:3.4.0
|    |    |    |    \--- com.google.errorprone:error_prone_annotations:2.3.4
|    |    |    \--- org.roaringbitmap:RoaringBitmap -> 0.9.22
|    |    |         \--- org.roaringbitmap:shims:0.9.22
|    |    +--- org.apache.orc:orc-core -> 1.7.3
|    |    |    +--- org.apache.orc:orc-shims:1.7.3
|    |    |    +--- io.airlift:aircompressor:0.21
|    |    |    +--- org.jetbrains:annotations:17.0.0
|    |    |    \--- org.threeten:threeten-extra:1.5.0
|    |    \--- org.apache.parquet:parquet-avro -> 1.12.2
|    |         +--- org.apache.parquet:parquet-column:1.12.2
|    |         |    +--- org.apache.parquet:parquet-common:1.12.2
|    |         |    |    +--- org.apache.parquet:parquet-format-structures:1.12.2
|    |         |    |    \--- org.apache.yetus:audience-annotations:0.12.0
|    |         |    \--- org.apache.parquet:parquet-encoding:1.12.2
|    |         |         \--- org.apache.parquet:parquet-common:1.12.2 (*)
|    |         +--- org.apache.parquet:parquet-hadoop:1.12.2
|    |         |    +--- org.apache.parquet:parquet-column:1.12.2 (*)
|    |         |    +--- org.apache.parquet:parquet-format-structures:1.12.2
|    |         |    +--- org.apache.parquet:parquet-jackson:1.12.2
|    |         |    \--- com.github.luben:zstd-jni:1.4.9-1
|    |         \--- org.apache.parquet:parquet-format-structures:1.12.2
|    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    +--- project :iceberg-bundled-guava
|    +--- project :iceberg-common (*)
|    +--- project :iceberg-core (*)
|    +--- project :iceberg-orc
|    |    +--- project :iceberg-api (*)
|    |    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    |    +--- project :iceberg-bundled-guava
|    |    +--- project :iceberg-core (*)
|    |    +--- org.apache.avro:avro -> 1.10.1 (*)
|    |    \--- org.apache.orc:orc-core -> 1.7.3 (*)
|    +--- project :iceberg-parquet
|    |    +--- project :iceberg-api (*)
|    |    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    |    +--- project :iceberg-bundled-guava
|    |    +--- project :iceberg-core (*)
|    |    +--- project :iceberg-common (*)
|    |    \--- org.apache.parquet:parquet-avro -> 1.12.2 (*)
|    +--- project :iceberg-hive-metastore
|    |    +--- project :iceberg-api (*)
|    |    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    |    +--- project :iceberg-bundled-guava
|    |    +--- project :iceberg-core (*)
|    |    +--- project :iceberg-common (*)
|    |    \--- com.github.ben-manes.caffeine:caffeine -> 2.8.4 (*)
|    +--- org.apache.parquet:parquet-avro -> 1.12.2 (*)
|    \--- org.apache.orc:orc-core -> 1.7.3 (*)
+--- project :iceberg-aws
|    +--- project :iceberg-api (*)
|    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    +--- project :iceberg-bundled-guava
|    +--- project :iceberg-common (*)
|    \--- project :iceberg-core (*)
+--- project :iceberg-aliyun
|    +--- project :iceberg-api (*)
|    +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
|    +--- project :iceberg-bundled-guava
|    +--- project :iceberg-core (*)
|    \--- project :iceberg-common (*)
\--- project :iceberg-nessie
     +--- project :iceberg-api (*)
     +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
     +--- project :iceberg-common (*)
     +--- project :iceberg-core (*)
     +--- project :iceberg-bundled-guava
     \--- org.projectnessie:nessie-client -> 0.20.1
          \--- org.projectnessie:nessie-model:0.20.1

(*) - dependencies omitted (listed previously)

A web-based, searchable dependency report is available by adding the --scan option.

Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

See https://docs.gradle.org/7.4/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 29s
1 actionable task: 1 executed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is here: #4193

kbendick added a commit to kbendick/iceberg that referenced this pull request May 17, 2022
Noticed we added support for Scala 2.13 with Spark 3.2 recently, but the build scripts have not been updated to reflect that.

This came up as a result of this discussion: apache#4157 (comment). There's possibly more work to do
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants