-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flink: Support configurable Scala version. #4157
Conversation
project(":iceberg-flink:flink-1.14_${scalaVersion}").projectDir = file('flink/v1.14/flink') | ||
project(":iceberg-flink:flink-1.14_${scalaVersion}").name = "iceberg-flink-1.14_${scalaVersion}" | ||
project(":iceberg-flink:flink-runtime-1.14_${scalaVersion}").projectDir = file('flink/v1.14/flink-runtime') | ||
project(":iceberg-flink:flink-runtime-1.14_${scalaVersion}").name = "iceberg-flink-runtime-1.14_${scalaVersion}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I’m wrong, but I think we need to be able to publish / build all combinations of Flink / Scala versions here. For example, when publishing a new release to maven we need to build all of the artifacts.
Is it possible to do that in a way that doesn’t require two build calls (one for each Scala version)? I’m not a highly experienced Scala with gradle user, so I might be mistaken or it might be necessary to use two build calls.
But it makes the release process more cumbersome and particular to have to call build multiple times, which is something we’ve been trying to make more streamlined (the build / release process).
I’m ok with iterating on things and would not block on this, but it is a concern for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the Spark on Scala 2.13 support PR, it looks like scalaVersion
flag with two build calls is what is used.
I’m not sure if the release scripts have been updated to support this yet though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to do that in a way that doesn’t require two build calls (one for each Scala version)?
As far as I know, the gradle
command cannot deploy the projects with two different values for a given system property. I mean I think the two build calls to build two separate flink scala runtime jar is the thing we need to do. I see we usually deploy the artifacts to maven repository by executing this stage-binaries.sh. I think it's okay to add the two gradle
build calls into this bash script.
(In general, I think it's a good question)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can't have multiple Scala versions in the build without strange errors. We'll have to update the binary release script to publish Scala 2.13 modules.
} | ||
String flinkVersion = '1.14.0' | ||
String flinkMajorVersion = '1.14' | ||
String scalaVersion = System.getProperty("scalaVersion") != null ? System.getProperty("scalaVersion") : System.getProperty("defaultScalaVersion") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We’ve recently also had work for supporting Scala 2.13 in Spark, so it might be wise to name scalaVersion
to something more Flink related to keep the two separated. Though this isn’t a list of all known Scala versions for Flink so might not be a concern here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, Spark is also using scalaVersion
flag. So we’ll definitely need to define explicit known Flink Scala version combos to disallow say, somebody trying to build Flink with Scala 2.13 (as the name will probably be shared).
Very interested in people’s thoughts on how to handle this concern of scalaVersion
and Flink / Spark versions and their different supported combinations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I also considered this issue before. The current available scala version for flink are 2.11
and 2.12
, for most of people as far I know, they mainly choose scala 2.11 for the production environment. But the available scala versions for spark are 2.11
, 2.12
, 2.13
( Some major version only support 2.12).
So in fact, the supported scala version ranges are quite different between spark & flink. Besides, the supported scala version ranges are also different between different spark versions. Let's see the following table:
- https://mvnrepository.com/artifact/org.apache.spark/spark-sql
- https://mvnrepository.com/artifact/org.apache.flink/flink-scala
So even if we separate the scala version between spark & flink, the same scala version still could not be applied to build all the iceberg spark runtime jars I think. The correct approach is: try to package related artifacts in one build process as much as possible. If we can't build in one build, we can only choose to build again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there can only be one Scala version in the build, I think using the same scalaVersion
flag should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I think we’re missing here is the ability to distinguish which Flink versions and which Scala versions go together. Presently, since you’re adding support for Scala 2.11 which is supported across all, it’s not a major concern. But eventually, when Flink supports Scala 2.13, I think it will be a concern. This might be ok to delay for later but worth mentioning.
The other thing I’ll mention is that we added the hard coded scala version _2.12
to Spark builds first, separately. I can’t say how much easier that made the process (I did that part but not the 2.13 upgrade), but I thought I’d share my experience.
Do you know if the Flink community is looking to adopt Scala 2.13 within the next few releases or if there’s any discussion around that?
Thanks @openinx!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here’s the recent PR for supporting Scala 2.13 in Spark 3.2:
Noticed we added support for Scala 2.13 with Spark 3.2 recently, but the build scripts have not been updated to reflect that. This came up as a result of this discussion: apache#4157 (comment). There's possibly more work to do
@openinx I created a draft PR to update the release script (just for the Scala 2.13 with Spark support). I probably should have just opened a ticket. Please feel free to close that one if you update things related to the release (otherwise we'll be sure to update before Iceberg 0.14.0). |
Thanks, @openinx! Good to have this ready for when Flink supports 2.13. |
Noticed we added support for Scala 2.13 with Spark 3.2 recently, but the build scripts have not been updated to reflect that. This came up as a result of this discussion: apache#4157 (comment). There's possibly more work to do
Noticed we added support for Scala 2.13 with Spark 3.2 recently, but the build scripts have not been updated to reflect that. This came up as a result of this discussion: apache#4157 (comment). There's possibly more work to do
Noticed we added support for Scala 2.13 with Spark 3.2 recently, but the build scripts have not been updated to reflect that. This came up as a result of this discussion: apache#4157 (comment). There's possibly more work to do
|
||
project(':iceberg-flink:iceberg-flink-1.12') { | ||
project(":iceberg-flink:iceberg-flink-${flinkMajorVersion}_${scalaVersion}") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue @kbendick Seems I made a mistake here ? Since we apache iceberg flink & flink-runtime modules don't introduce any scala dependencies, so in theory the iceberg-flink-1.14.jar & iceberg-flink-runtime-1.14.jar can work fine on both flink-1.14.0-bin-scala_2.11 & flink-1.14.0-bin-scala_2.12 release. So we don't need to add any _${scalaVersion}
suffix for iceberg-flink
& iceberg-flink-runtime
jar.
(But we still need this PR to make iceberg-flink/iceber-flink-runtime compile on different scala version ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
➜ iceberg git:(master) ✗ ./gradlew :iceberg-flink:iceberg-flink-runtime-1.14_2.12:dependencies --configuration runtimeClasspath
> Task :iceberg-flink:iceberg-flink-runtime-1.14_2.12:dependencies
------------------------------------------------------------
Project ':iceberg-flink:iceberg-flink-runtime-1.14_2.12'
------------------------------------------------------------
runtimeClasspath - Runtime classpath of source set 'main'.
+--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
+--- project :iceberg-flink:iceberg-flink-1.14_2.12
| +--- project :iceberg-api
| | +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
| | \--- project :iceberg-bundled-guava
| +--- project :iceberg-data
| | +--- project :iceberg-api (*)
| | +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
| | +--- project :iceberg-bundled-guava
| | +--- project :iceberg-core
| | | +--- project :iceberg-api (*)
| | | +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
| | | +--- project :iceberg-common
| | | | +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
| | | | \--- project :iceberg-bundled-guava
| | | +--- project :iceberg-bundled-guava
| | | +--- org.apache.avro:avro -> 1.10.1
| | | | +--- com.fasterxml.jackson.core:jackson-core:2.11.3 -> 2.11.4
| | | | \--- com.fasterxml.jackson.core:jackson-databind:2.11.3 -> 2.11.4
| | | | +--- com.fasterxml.jackson.core:jackson-annotations:2.11.4
| | | | \--- com.fasterxml.jackson.core:jackson-core:2.11.4
| | | +--- com.fasterxml.jackson.core:jackson-databind -> 2.11.4 (*)
| | | +--- com.fasterxml.jackson.core:jackson-core -> 2.11.4
| | | +--- com.github.ben-manes.caffeine:caffeine -> 2.8.4
| | | | +--- org.checkerframework:checker-qual:3.4.0
| | | | \--- com.google.errorprone:error_prone_annotations:2.3.4
| | | \--- org.roaringbitmap:RoaringBitmap -> 0.9.22
| | | \--- org.roaringbitmap:shims:0.9.22
| | +--- org.apache.orc:orc-core -> 1.7.3
| | | +--- org.apache.orc:orc-shims:1.7.3
| | | +--- io.airlift:aircompressor:0.21
| | | +--- org.jetbrains:annotations:17.0.0
| | | \--- org.threeten:threeten-extra:1.5.0
| | \--- org.apache.parquet:parquet-avro -> 1.12.2
| | +--- org.apache.parquet:parquet-column:1.12.2
| | | +--- org.apache.parquet:parquet-common:1.12.2
| | | | +--- org.apache.parquet:parquet-format-structures:1.12.2
| | | | \--- org.apache.yetus:audience-annotations:0.12.0
| | | \--- org.apache.parquet:parquet-encoding:1.12.2
| | | \--- org.apache.parquet:parquet-common:1.12.2 (*)
| | +--- org.apache.parquet:parquet-hadoop:1.12.2
| | | +--- org.apache.parquet:parquet-column:1.12.2 (*)
| | | +--- org.apache.parquet:parquet-format-structures:1.12.2
| | | +--- org.apache.parquet:parquet-jackson:1.12.2
| | | \--- com.github.luben:zstd-jni:1.4.9-1
| | \--- org.apache.parquet:parquet-format-structures:1.12.2
| +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
| +--- project :iceberg-bundled-guava
| +--- project :iceberg-common (*)
| +--- project :iceberg-core (*)
| +--- project :iceberg-orc
| | +--- project :iceberg-api (*)
| | +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
| | +--- project :iceberg-bundled-guava
| | +--- project :iceberg-core (*)
| | +--- org.apache.avro:avro -> 1.10.1 (*)
| | \--- org.apache.orc:orc-core -> 1.7.3 (*)
| +--- project :iceberg-parquet
| | +--- project :iceberg-api (*)
| | +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
| | +--- project :iceberg-bundled-guava
| | +--- project :iceberg-core (*)
| | +--- project :iceberg-common (*)
| | \--- org.apache.parquet:parquet-avro -> 1.12.2 (*)
| +--- project :iceberg-hive-metastore
| | +--- project :iceberg-api (*)
| | +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
| | +--- project :iceberg-bundled-guava
| | +--- project :iceberg-core (*)
| | +--- project :iceberg-common (*)
| | \--- com.github.ben-manes.caffeine:caffeine -> 2.8.4 (*)
| +--- org.apache.parquet:parquet-avro -> 1.12.2 (*)
| \--- org.apache.orc:orc-core -> 1.7.3 (*)
+--- project :iceberg-aws
| +--- project :iceberg-api (*)
| +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
| +--- project :iceberg-bundled-guava
| +--- project :iceberg-common (*)
| \--- project :iceberg-core (*)
+--- project :iceberg-aliyun
| +--- project :iceberg-api (*)
| +--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
| +--- project :iceberg-bundled-guava
| +--- project :iceberg-core (*)
| \--- project :iceberg-common (*)
\--- project :iceberg-nessie
+--- project :iceberg-api (*)
+--- com.github.stephenc.findbugs:findbugs-annotations -> 1.3.9-1
+--- project :iceberg-common (*)
+--- project :iceberg-core (*)
+--- project :iceberg-bundled-guava
\--- org.projectnessie:nessie-client -> 0.20.1
\--- org.projectnessie:nessie-model:0.20.1
(*) - dependencies omitted (listed previously)
A web-based, searchable dependency report is available by adding the --scan option.
Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.
You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.
See https://docs.gradle.org/7.4/userguide/command_line_interface.html#sec:command_line_warnings
BUILD SUCCESSFUL in 29s
1 actionable task: 1 executed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR is here: #4193
Noticed we added support for Scala 2.13 with Spark 3.2 recently, but the build scripts have not been updated to reflect that. This came up as a result of this discussion: apache#4157 (comment). There's possibly more work to do
As apache flink usually will release a version with two scala version, but we apache iceberg only support scala 2.12 now. So here I publish a PR to support configurable scala version for building flink iceberg module.
We can use the following command to build flink runtime modules for scala 2.12:
./gradlew clean build -x test -DflinkVersions=1.12,1.13,1.14 \ -DscalaVersion=2.12 \ -DknownScalaVersions=2.12 \ -x javadoc -Pquick \ -DsparkVersions= -DhiveVersions=
Their packaged jar are looks like:
And use the following command to build flink runtime modules for scala 2.11:
./gradlew clean build -x test -DflinkVersions=1.12,1.13,1.14 \ -DscalaVersion=2.11 \ -DknownScalaVersions=2.11 \ -x javadoc -Pquick \ -DsparkVersions= -DhiveVersions=
The build jars are similar to the above context.