Spark 3.0: Remove 3.0 from docs and builds (apache#6093)

yabola · Nov 6, 2022 · 396c6be · 396c6be
1 parent cee3ad4
commit 396c6be
Show file tree

Hide file tree

Showing 32 changed files with 68 additions and 102 deletions.
diff --git a/.github/labeler.yml b/.github/labeler.yml
@@ -61,7 +61,6 @@ DATA:
   - data/**/*
 SPARK:
   - spark-runtime/**/*
-  - spark3-runtime/**/*
   - spark/**/*
   - spark2/**/*
   - spark3/**/*

diff --git a/.github/workflows/java-ci.yml b/.github/workflows/java-ci.yml
@@ -88,7 +88,7 @@ jobs:
       with:
         distribution: zulu
         java-version: 8
-    - run: ./gradlew -DflinkVersions=1.14,1.15,1.16 -DsparkVersions=2.4,3.0,3.1,3.2,3.3 -DhiveVersions=2,3 build -x test -x javadoc -x integrationTest
+    - run: ./gradlew -DflinkVersions=1.14,1.15,1.16 -DsparkVersions=2.4,3.1,3.2,3.3 -DhiveVersions=2,3 build -x test -x javadoc -x integrationTest
 
   build-javadoc:
     runs-on: ubuntu-20.04

diff --git a/.github/workflows/publish-snapshot.yml b/.github/workflows/publish-snapshot.yml
@@ -40,5 +40,5 @@ jobs:
           java-version: 8
       - run: |
           ./gradlew printVersion
-          ./gradlew -DflinkVersions=1.14,1.15,1.16 -DsparkVersions=2.4,3.0,3.1,3.2,3.3 -DhiveVersions=2,3 publishApachePublicationToMavenRepository -PmavenUser=${{ secrets.NEXUS_USER }} -PmavenPassword=${{ secrets.NEXUS_PW }}
+          ./gradlew -DflinkVersions=1.14,1.15,1.16 -DsparkVersions=2.4,3.1,3.2,3.3 -DhiveVersions=2,3 publishApachePublicationToMavenRepository -PmavenUser=${{ secrets.NEXUS_USER }} -PmavenPassword=${{ secrets.NEXUS_PW }}
           ./gradlew -DflinkVersions= -DsparkVersions=3.2,3.3 -DscalaVersion=2.13 -DhiveVersions= publishApachePublicationToMavenRepository -PmavenUser=${{ secrets.NEXUS_USER }} -PmavenPassword=${{ secrets.NEXUS_PW }}
diff --git a/.github/workflows/spark-ci.yml b/.github/workflows/spark-ci.yml
@@ -87,7 +87,7 @@ jobs:
     strategy:
       matrix:
         jvm: [8, 11]
-        spark: ['3.0', '3.1', '3.2', '3.3']
+        spark: ['3.1', '3.2', '3.3']
     env:
       SPARK_LOCAL_IP: localhost
     steps:

diff --git a/.gitignore b/.gitignore
@@ -29,7 +29,6 @@ site/site
 
 # benchmark output
 spark/v2.4/spark/benchmark/*
-spark/v3.0/spark/benchmark/*
 spark/v3.1/spark/benchmark/*
 spark/v3.2/spark/benchmark/*
 spark/v3.3/spark/benchmark/*

diff --git a/README.md b/README.md
@@ -74,8 +74,7 @@ Iceberg table support is organized in library modules:
 
 Iceberg also has modules for adding Iceberg support to processing engines:
 
-* `iceberg-spark2` is an implementation of Spark's Datasource V2 API in 2.4 for Iceberg (use iceberg-spark-runtime for a shaded version)
-* `iceberg-spark3` is an implementation of Spark's Datasource V2 API in 3.0 for Iceberg (use iceberg-spark3-runtime for a shaded version)
+* `iceberg-spark` is an implementation of Spark's Datasource V2 API for Iceberg with submodules for each spark versions (use runtime jars for a shaded version)
 * `iceberg-flink` contains classes for integrating with Apache Flink (use iceberg-flink-runtime for a shaded version)
 * `iceberg-mr` contains an InputFormat and other classes for integrating with Apache Hive
 * `iceberg-pig` is an implementation of Pig's LoadFunc API for Iceberg

diff --git a/dev/stage-binaries.sh b/dev/stage-binaries.sh
@@ -20,7 +20,7 @@
 
 SCALA_VERSION=2.12
 FLINK_VERSIONS=1.14,1.15,1.16
-SPARK_VERSIONS=2.4,3.0,3.1,3.2,3.3
+SPARK_VERSIONS=2.4,3.1,3.2,3.3
 HIVE_VERSIONS=2,3
 
 ./gradlew -Prelease -DscalaVersion=$SCALA_VERSION -DflinkVersions=$FLINK_VERSIONS -DsparkVersions=$SPARK_VERSIONS -DhiveVersions=$HIVE_VERSIONS publishApachePublicationToMavenRepository

diff --git a/docs/aws.md b/docs/aws.md
@@ -48,12 +48,12 @@ Here are some examples.
 
 ### Spark
 
-For example, to use AWS features with Spark 3.0 and AWS clients version 2.17.257, you can start the Spark SQL shell with:
+For example, to use AWS features with Spark 3.3 (with scala 2.12) and AWS clients version 2.17.257, you can start the Spark SQL shell with:
 
 ```sh
 # add Iceberg dependency
 ICEBERG_VERSION={{% icebergVersion %}}
-DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION"
+DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:$ICEBERG_VERSION"
 
 # add AWS dependnecy
 AWS_SDK_VERSION=2.17.257
@@ -435,7 +435,7 @@ This is turned off by default.
 ### S3 Tags
 
 Custom [tags](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html) can be added to S3 objects while writing and deleting.
-For example, to write S3 tags with Spark 3.0, you can start the Spark SQL shell with:
+For example, to write S3 tags with Spark 3.3, you can start the Spark SQL shell with:
 ```
 spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
     --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \
@@ -452,7 +452,7 @@ The property is set to `true` by default.
 
 With the `s3.delete.tags` config, objects are tagged with the configured key-value pairs before deletion.
 Users can configure tag-based object lifecycle policy at bucket level to transition objects to different tiers.
-For example, to add S3 delete tags with Spark 3.0, you can start the Spark SQL shell with: 
+For example, to add S3 delete tags with Spark 3.3, you can start the Spark SQL shell with: 
 
 ```
 sh spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
@@ -468,7 +468,7 @@ Users can also use the catalog property `s3.delete.num-threads` to mention the n
 
 When the catalog property `s3.write.table-tag-enabled` and `s3.write.namespace-tag-enabled` is set to `true` then the objects in S3 will be saved with tags: `iceberg.table=<table-name>` and `iceberg.namespace=<namespace-name>`.
 Users can define access and data retention policy per namespace or table based on these tags.
-For example, to write table and namespace name as S3 tags with Spark 3.0, you can start the Spark SQL shell with:
+For example, to write table and namespace name as S3 tags with Spark 3.3, you can start the Spark SQL shell with:
 ```
 sh spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
     --conf spark.sql.catalog.my_catalog.warehouse=s3://iceberg-warehouse/s3-tagging \
@@ -488,7 +488,7 @@ disaster recovery, etc.
 For using cross-region access points, we need to additionally set `use-arn-region-enabled` catalog property to
 `true` to enable `S3FileIO` to make cross-region calls, it's not required for same / multi-region access points.
 
-For example, to use S3 access-point with Spark 3.0, you can start the Spark SQL shell with:
+For example, to use S3 access-point with Spark 3.3, you can start the Spark SQL shell with:
 ```
 spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
     --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket2/my/key/prefix \
@@ -509,7 +509,7 @@ For more details on using access-points, please refer [Using access points with
 
 To use S3 Acceleration, we need to set `s3.acceleration-enabled` catalog property to `true` to enable `S3FileIO` to make accelerated S3 calls.
 
-For example, to use S3 Acceleration with Spark 3.0, you can start the Spark SQL shell with:
+For example, to use S3 Acceleration with Spark 3.3, you can start the Spark SQL shell with:
 ```
 spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
     --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket2/my/key/prefix \
@@ -527,7 +527,7 @@ When clients make a request to a dual-stack endpoint, the bucket URL resolves to
 
 To use S3 Dual-stack, we need to set `s3.dualstack-enabled` catalog property to `true` to enable `S3FileIO` to make dual-stack S3 calls.
 
-For example, to use S3 Dual-stack with Spark 3.0, you can start the Spark SQL shell with:
+For example, to use S3 Dual-stack with Spark 3.3, you can start the Spark SQL shell with:
 ```
 spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
     --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket2/my/key/prefix \
@@ -564,7 +564,7 @@ The Glue, S3 and DynamoDB clients are then initialized with the assume-role cred
 Here is an example to start Spark shell with this client factory:
 
 ```shell
-spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersion %}},software.amazon.awssdk:bundle:2.17.257 \
+spark-sql --packages org.apache.iceberg:iceberg-spark-runtime:{{% icebergVersion %}},software.amazon.awssdk:bundle:2.17.257 \
     --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
     --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \    
     --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
@@ -658,7 +658,7 @@ AWS_PACKAGES=(
 )
 
 ICEBERG_PACKAGES=(
-  "iceberg-spark3-runtime"
+  "iceberg-spark-runtime-3.3_2.12"
   "iceberg-flink-runtime"
 )
 

diff --git a/docs/java-api.md b/docs/java-api.md
@@ -252,10 +252,7 @@ Iceberg table support is organized in library modules:
 
 This project Iceberg also has modules for adding Iceberg support to processing engines and associated tooling:
 
-* `iceberg-spark2` is an implementation of Spark's Datasource V2 API in 2.4 for Iceberg (use iceberg-spark-runtime for a shaded version)
-* `iceberg-spark3` is an implementation of Spark's Datasource V2 API in 3.0 for Iceberg (use iceberg-spark3-runtime for a shaded version)
-* `iceberg-spark-3.1` is an implementation of Spark's Datasource V2 API in 3.1 for Iceberg (use iceberg-spark-runtime-3.1 for a shaded version)
-* `iceberg-spark-3.2` is an implementation of Spark's Datasource V2 API in 3.2 for Iceberg (use iceberg-spark-runtime-3.2 for a shaded version)
+* `iceberg-spark` is an implementation of Spark's Datasource V2 API for Iceberg with submodules for each spark versions (use runtime jars for a shaded version)
 * `iceberg-flink` is an implementation of Flink's Table and DataStream API for Iceberg (use iceberg-flink-runtime for a shaded version)
 * `iceberg-hive3` is an implementation of Hive 3 specific SerDe's for Timestamp, TimestampWithZone, and Date object inspectors (use iceberg-hive-runtime for a shaded version).
 * `iceberg-mr` is an implementation of MapReduce and Hive InputFormats and SerDes for Iceberg (use iceberg-hive-runtime for a shaded version for use with Hive)

diff --git a/docs/nessie.md b/docs/nessie.md
@@ -38,16 +38,16 @@ See [Project Nessie](https://projectnessie.org) for more information on Nessie.
 ## Enabling Nessie Catalog
 
 The `iceberg-nessie` module is bundled with Spark and Flink runtimes for all versions from `0.11.0`. To get started
-with Nessie and Iceberg simply add the Iceberg runtime to your process. Eg: `spark-sql --packages
-org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersion %}}`. 
+with Nessie (with spark-3.3) and Iceberg simply add the Iceberg runtime to your process. Eg: `spark-sql --packages
+org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:{{% icebergVersion %}}`. 
 
 ## Spark SQL Extensions
 
-From Spark 3.0, Nessie SQL extensions can be used to manage the Nessie repo as shown below. 
+From Spark 3.3 (with scala 2.12), Nessie SQL extensions can be used to manage the Nessie repo as shown below. 
 
 ```
 bin/spark-sql 
-  --packages "org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersion %}},org.projectnessie:nessie-spark-extensions:{{% nessieVersion %}}"
+  --packages "org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:{{% icebergVersion %}},org.projectnessie:nessie-spark-extensions:{{% nessieVersion %}}"
   --conf spark.sql.extensions="org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions"
   --conf <other settings>
 ```

diff --git a/docs/spark-configuration.md b/docs/spark-configuration.md
@@ -29,7 +29,7 @@ menu:
 
 ## Catalogs
 
-Spark 3.0 adds an API to plug in table catalogs that are used to load, create, and manage Iceberg tables. Spark catalogs are configured by setting Spark properties under `spark.sql.catalog`.
+Spark adds an API to plug in table catalogs that are used to load, create, and manage Iceberg tables. Spark catalogs are configured by setting Spark properties under `spark.sql.catalog`.
 
 This creates an Iceberg catalog named `hive_prod` that loads tables from a Hive metastore:
 
@@ -128,7 +128,7 @@ spark.sql.catalog.custom_prod.my-additional-catalog-config = my-value
 
 When using Iceberg 0.11.0 and later, Spark 2.4 can load tables from multiple Iceberg catalogs or from table locations.
 
-Catalogs in 2.4 are configured just like catalogs in 3.0, but only Iceberg catalogs are supported.
+Catalogs in 2.4 are configured just like catalogs in 3.x, but only Iceberg catalogs are supported.
 
 
 ## SQL Extensions

diff --git a/docs/spark-ddl.md b/docs/spark-ddl.md
@@ -32,12 +32,12 @@ To use Iceberg in Spark, first configure [Spark catalogs](../spark-configuration
 Iceberg uses Apache Spark's DataSourceV2 API for data source and catalog implementations. Spark DSv2 is an evolving API with different levels of support in Spark versions. Spark 2.4 does not support SQL DDL.
 
 {{< hint info >}}
-Spark 2.4 can't create Iceberg tables with DDL, instead use Spark 3.x or the [Iceberg API](..//java-api-quickstart).
+Spark 2.4 can't create Iceberg tables with DDL, instead use Spark 3 or the [Iceberg API](..//java-api-quickstart).
 {{< /hint >}}
 
 ## `CREATE TABLE`
 
-Spark 3.0 can create tables in any Iceberg catalog with the clause `USING iceberg`:
+Spark 3 can create tables in any Iceberg catalog with the clause `USING iceberg`:
 
 ```sql
 CREATE TABLE prod.db.sample (
@@ -333,7 +333,7 @@ ALTER TABLE prod.db.sample DROP COLUMN point.z
 
 ## `ALTER TABLE` SQL extensions
 
-These commands are available in Spark 3.x when using Iceberg [SQL extensions](../spark-configuration#sql-extensions).
+These commands are available in Spark 3 when using Iceberg [SQL extensions](../spark-configuration#sql-extensions).
 
 ### `ALTER TABLE ... ADD PARTITION FIELD`
 

diff --git a/docs/spark-procedures.md b/docs/spark-procedures.md
@@ -27,7 +27,7 @@ menu:
 
 # Spark Procedures
 
-To use Iceberg in Spark, first configure [Spark catalogs](../spark-configuration). Stored procedures are only available when using [Iceberg SQL extensions](../spark-configuration#sql-extensions) in Spark 3.x.
+To use Iceberg in Spark, first configure [Spark catalogs](../spark-configuration). Stored procedures are only available when using [Iceberg SQL extensions](../spark-configuration#sql-extensions) in Spark 3.
 
 ## Usage
 

diff --git a/docs/spark-queries.md b/docs/spark-queries.md
@@ -31,8 +31,8 @@ To use Iceberg in Spark, first configure [Spark catalogs](../spark-configuration
 
 Iceberg uses Apache Spark's DataSourceV2 API for data source and catalog implementations. Spark DSv2 is an evolving API with different levels of support in Spark versions:
 
-| Feature support                                  | Spark 3.0| Spark 2.4  | Notes                                          |
-|--------------------------------------------------|----------|------------|------------------------------------------------|
+| Feature support                                  | Spark 3 | Spark 2.4  | Notes                                          |
+|--------------------------------------------------|-----------|------------|------------------------------------------------|
 | [`SELECT`](#querying-with-sql)                   | ✔️        |            |                                                |
 | [DataFrame reads](#querying-with-dataframes)     | ✔️        | ✔️          |                                                |
 | [Metadata table `SELECT`](#inspecting-tables)    | ✔️        |            |                                                |
@@ -75,7 +75,7 @@ val df = spark.table("prod.db.table")
 
 ### Catalogs with DataFrameReader
 
-Iceberg 0.11.0 adds multi-catalog support to `DataFrameReader` in both Spark 3.x and 2.4.
+Iceberg 0.11.0 adds multi-catalog support to `DataFrameReader` in both Spark 3 and 2.4.
 
 Paths and table names can be loaded with Spark's `DataFrameReader` interface. How tables are loaded depends on how
 the identifier is specified. When using `spark.read.format("iceberg").load(table)` or `spark.table(table)` the `table`

diff --git a/docs/spark-structured-streaming.md b/docs/spark-structured-streaming.md
@@ -30,11 +30,11 @@ menu:
 Iceberg uses Apache Spark's DataSourceV2 API for data source and catalog implementations. Spark DSv2 is an evolving API
 with different levels of support in Spark versions.
 
-As of Spark 3.0, DataFrame reads and writes are supported.
+As of Spark 3, DataFrame reads and writes are supported.
 
-| Feature support                                  | Spark 3.0| Spark 2.4  | Notes                                          |
-|--------------------------------------------------|----------|------------|------------------------------------------------|
-| [DataFrame write](#streaming-writes)             | ✔        | ✔          |                                                |
+| Feature support                                  | Spark 3 | Spark 2.4  | Notes                                          |
+|--------------------------------------------------|-----------|------------|------------------------------------------------|
+| [DataFrame write](#streaming-writes)             | ✔         | ✔          |                                                |
 
 ## Streaming Reads