Skip to content

Commit

Permalink
Spark 3.0: Remove 3.0 from docs and builds (apache#6093)
Browse files Browse the repository at this point in the history
  • Loading branch information
ajantha-bhat authored Nov 6, 2022
1 parent cee3ad4 commit 396c6be
Show file tree
Hide file tree
Showing 32 changed files with 68 additions and 102 deletions.
1 change: 0 additions & 1 deletion .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,6 @@ DATA:
- data/**/*
SPARK:
- spark-runtime/**/*
- spark3-runtime/**/*
- spark/**/*
- spark2/**/*
- spark3/**/*
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/java-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ jobs:
with:
distribution: zulu
java-version: 8
- run: ./gradlew -DflinkVersions=1.14,1.15,1.16 -DsparkVersions=2.4,3.0,3.1,3.2,3.3 -DhiveVersions=2,3 build -x test -x javadoc -x integrationTest
- run: ./gradlew -DflinkVersions=1.14,1.15,1.16 -DsparkVersions=2.4,3.1,3.2,3.3 -DhiveVersions=2,3 build -x test -x javadoc -x integrationTest

build-javadoc:
runs-on: ubuntu-20.04
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish-snapshot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,5 +40,5 @@ jobs:
java-version: 8
- run: |
./gradlew printVersion
./gradlew -DflinkVersions=1.14,1.15,1.16 -DsparkVersions=2.4,3.0,3.1,3.2,3.3 -DhiveVersions=2,3 publishApachePublicationToMavenRepository -PmavenUser=${{ secrets.NEXUS_USER }} -PmavenPassword=${{ secrets.NEXUS_PW }}
./gradlew -DflinkVersions=1.14,1.15,1.16 -DsparkVersions=2.4,3.1,3.2,3.3 -DhiveVersions=2,3 publishApachePublicationToMavenRepository -PmavenUser=${{ secrets.NEXUS_USER }} -PmavenPassword=${{ secrets.NEXUS_PW }}
./gradlew -DflinkVersions= -DsparkVersions=3.2,3.3 -DscalaVersion=2.13 -DhiveVersions= publishApachePublicationToMavenRepository -PmavenUser=${{ secrets.NEXUS_USER }} -PmavenPassword=${{ secrets.NEXUS_PW }}
2 changes: 1 addition & 1 deletion .github/workflows/spark-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ jobs:
strategy:
matrix:
jvm: [8, 11]
spark: ['3.0', '3.1', '3.2', '3.3']
spark: ['3.1', '3.2', '3.3']
env:
SPARK_LOCAL_IP: localhost
steps:
Expand Down
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ site/site

# benchmark output
spark/v2.4/spark/benchmark/*
spark/v3.0/spark/benchmark/*
spark/v3.1/spark/benchmark/*
spark/v3.2/spark/benchmark/*
spark/v3.3/spark/benchmark/*
Expand Down
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,7 @@ Iceberg table support is organized in library modules:

Iceberg also has modules for adding Iceberg support to processing engines:

* `iceberg-spark2` is an implementation of Spark's Datasource V2 API in 2.4 for Iceberg (use iceberg-spark-runtime for a shaded version)
* `iceberg-spark3` is an implementation of Spark's Datasource V2 API in 3.0 for Iceberg (use iceberg-spark3-runtime for a shaded version)
* `iceberg-spark` is an implementation of Spark's Datasource V2 API for Iceberg with submodules for each spark versions (use runtime jars for a shaded version)
* `iceberg-flink` contains classes for integrating with Apache Flink (use iceberg-flink-runtime for a shaded version)
* `iceberg-mr` contains an InputFormat and other classes for integrating with Apache Hive
* `iceberg-pig` is an implementation of Pig's LoadFunc API for Iceberg
Expand Down
2 changes: 1 addition & 1 deletion dev/stage-binaries.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

SCALA_VERSION=2.12
FLINK_VERSIONS=1.14,1.15,1.16
SPARK_VERSIONS=2.4,3.0,3.1,3.2,3.3
SPARK_VERSIONS=2.4,3.1,3.2,3.3
HIVE_VERSIONS=2,3

./gradlew -Prelease -DscalaVersion=$SCALA_VERSION -DflinkVersions=$FLINK_VERSIONS -DsparkVersions=$SPARK_VERSIONS -DhiveVersions=$HIVE_VERSIONS publishApachePublicationToMavenRepository
Expand Down
20 changes: 10 additions & 10 deletions docs/aws.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,12 @@ Here are some examples.

### Spark

For example, to use AWS features with Spark 3.0 and AWS clients version 2.17.257, you can start the Spark SQL shell with:
For example, to use AWS features with Spark 3.3 (with scala 2.12) and AWS clients version 2.17.257, you can start the Spark SQL shell with:

```sh
# add Iceberg dependency
ICEBERG_VERSION={{% icebergVersion %}}
DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION"
DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:$ICEBERG_VERSION"

# add AWS dependnecy
AWS_SDK_VERSION=2.17.257
Expand Down Expand Up @@ -435,7 +435,7 @@ This is turned off by default.
### S3 Tags

Custom [tags](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html) can be added to S3 objects while writing and deleting.
For example, to write S3 tags with Spark 3.0, you can start the Spark SQL shell with:
For example, to write S3 tags with Spark 3.3, you can start the Spark SQL shell with:
```
spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \
Expand All @@ -452,7 +452,7 @@ The property is set to `true` by default.

With the `s3.delete.tags` config, objects are tagged with the configured key-value pairs before deletion.
Users can configure tag-based object lifecycle policy at bucket level to transition objects to different tiers.
For example, to add S3 delete tags with Spark 3.0, you can start the Spark SQL shell with:
For example, to add S3 delete tags with Spark 3.3, you can start the Spark SQL shell with:

```
sh spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
Expand All @@ -468,7 +468,7 @@ Users can also use the catalog property `s3.delete.num-threads` to mention the n

When the catalog property `s3.write.table-tag-enabled` and `s3.write.namespace-tag-enabled` is set to `true` then the objects in S3 will be saved with tags: `iceberg.table=<table-name>` and `iceberg.namespace=<namespace-name>`.
Users can define access and data retention policy per namespace or table based on these tags.
For example, to write table and namespace name as S3 tags with Spark 3.0, you can start the Spark SQL shell with:
For example, to write table and namespace name as S3 tags with Spark 3.3, you can start the Spark SQL shell with:
```
sh spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.my_catalog.warehouse=s3://iceberg-warehouse/s3-tagging \
Expand All @@ -488,7 +488,7 @@ disaster recovery, etc.
For using cross-region access points, we need to additionally set `use-arn-region-enabled` catalog property to
`true` to enable `S3FileIO` to make cross-region calls, it's not required for same / multi-region access points.

For example, to use S3 access-point with Spark 3.0, you can start the Spark SQL shell with:
For example, to use S3 access-point with Spark 3.3, you can start the Spark SQL shell with:
```
spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket2/my/key/prefix \
Expand All @@ -509,7 +509,7 @@ For more details on using access-points, please refer [Using access points with

To use S3 Acceleration, we need to set `s3.acceleration-enabled` catalog property to `true` to enable `S3FileIO` to make accelerated S3 calls.

For example, to use S3 Acceleration with Spark 3.0, you can start the Spark SQL shell with:
For example, to use S3 Acceleration with Spark 3.3, you can start the Spark SQL shell with:
```
spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket2/my/key/prefix \
Expand All @@ -527,7 +527,7 @@ When clients make a request to a dual-stack endpoint, the bucket URL resolves to

To use S3 Dual-stack, we need to set `s3.dualstack-enabled` catalog property to `true` to enable `S3FileIO` to make dual-stack S3 calls.

For example, to use S3 Dual-stack with Spark 3.0, you can start the Spark SQL shell with:
For example, to use S3 Dual-stack with Spark 3.3, you can start the Spark SQL shell with:
```
spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket2/my/key/prefix \
Expand Down Expand Up @@ -564,7 +564,7 @@ The Glue, S3 and DynamoDB clients are then initialized with the assume-role cred
Here is an example to start Spark shell with this client factory:

```shell
spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersion %}},software.amazon.awssdk:bundle:2.17.257 \
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime:{{% icebergVersion %}},software.amazon.awssdk:bundle:2.17.257 \
--conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \
--conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
Expand Down Expand Up @@ -658,7 +658,7 @@ AWS_PACKAGES=(
)

ICEBERG_PACKAGES=(
"iceberg-spark3-runtime"
"iceberg-spark-runtime-3.3_2.12"
"iceberg-flink-runtime"
)

Expand Down
5 changes: 1 addition & 4 deletions docs/java-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,10 +252,7 @@ Iceberg table support is organized in library modules:

This project Iceberg also has modules for adding Iceberg support to processing engines and associated tooling:

* `iceberg-spark2` is an implementation of Spark's Datasource V2 API in 2.4 for Iceberg (use iceberg-spark-runtime for a shaded version)
* `iceberg-spark3` is an implementation of Spark's Datasource V2 API in 3.0 for Iceberg (use iceberg-spark3-runtime for a shaded version)
* `iceberg-spark-3.1` is an implementation of Spark's Datasource V2 API in 3.1 for Iceberg (use iceberg-spark-runtime-3.1 for a shaded version)
* `iceberg-spark-3.2` is an implementation of Spark's Datasource V2 API in 3.2 for Iceberg (use iceberg-spark-runtime-3.2 for a shaded version)
* `iceberg-spark` is an implementation of Spark's Datasource V2 API for Iceberg with submodules for each spark versions (use runtime jars for a shaded version)
* `iceberg-flink` is an implementation of Flink's Table and DataStream API for Iceberg (use iceberg-flink-runtime for a shaded version)
* `iceberg-hive3` is an implementation of Hive 3 specific SerDe's for Timestamp, TimestampWithZone, and Date object inspectors (use iceberg-hive-runtime for a shaded version).
* `iceberg-mr` is an implementation of MapReduce and Hive InputFormats and SerDes for Iceberg (use iceberg-hive-runtime for a shaded version for use with Hive)
Expand Down
8 changes: 4 additions & 4 deletions docs/nessie.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,16 +38,16 @@ See [Project Nessie](https://projectnessie.org) for more information on Nessie.
## Enabling Nessie Catalog

The `iceberg-nessie` module is bundled with Spark and Flink runtimes for all versions from `0.11.0`. To get started
with Nessie and Iceberg simply add the Iceberg runtime to your process. Eg: `spark-sql --packages
org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersion %}}`.
with Nessie (with spark-3.3) and Iceberg simply add the Iceberg runtime to your process. Eg: `spark-sql --packages
org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:{{% icebergVersion %}}`.

## Spark SQL Extensions

From Spark 3.0, Nessie SQL extensions can be used to manage the Nessie repo as shown below.
From Spark 3.3 (with scala 2.12), Nessie SQL extensions can be used to manage the Nessie repo as shown below.

```
bin/spark-sql
--packages "org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersion %}},org.projectnessie:nessie-spark-extensions:{{% nessieVersion %}}"
--packages "org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:{{% icebergVersion %}},org.projectnessie:nessie-spark-extensions:{{% nessieVersion %}}"
--conf spark.sql.extensions="org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions"
--conf <other settings>
```
Expand Down
4 changes: 2 additions & 2 deletions docs/spark-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ menu:

## Catalogs

Spark 3.0 adds an API to plug in table catalogs that are used to load, create, and manage Iceberg tables. Spark catalogs are configured by setting Spark properties under `spark.sql.catalog`.
Spark adds an API to plug in table catalogs that are used to load, create, and manage Iceberg tables. Spark catalogs are configured by setting Spark properties under `spark.sql.catalog`.

This creates an Iceberg catalog named `hive_prod` that loads tables from a Hive metastore:

Expand Down Expand Up @@ -128,7 +128,7 @@ spark.sql.catalog.custom_prod.my-additional-catalog-config = my-value

When using Iceberg 0.11.0 and later, Spark 2.4 can load tables from multiple Iceberg catalogs or from table locations.

Catalogs in 2.4 are configured just like catalogs in 3.0, but only Iceberg catalogs are supported.
Catalogs in 2.4 are configured just like catalogs in 3.x, but only Iceberg catalogs are supported.


## SQL Extensions
Expand Down
6 changes: 3 additions & 3 deletions docs/spark-ddl.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,12 @@ To use Iceberg in Spark, first configure [Spark catalogs](../spark-configuration
Iceberg uses Apache Spark's DataSourceV2 API for data source and catalog implementations. Spark DSv2 is an evolving API with different levels of support in Spark versions. Spark 2.4 does not support SQL DDL.

{{< hint info >}}
Spark 2.4 can't create Iceberg tables with DDL, instead use Spark 3.x or the [Iceberg API](..//java-api-quickstart).
Spark 2.4 can't create Iceberg tables with DDL, instead use Spark 3 or the [Iceberg API](..//java-api-quickstart).
{{< /hint >}}

## `CREATE TABLE`

Spark 3.0 can create tables in any Iceberg catalog with the clause `USING iceberg`:
Spark 3 can create tables in any Iceberg catalog with the clause `USING iceberg`:

```sql
CREATE TABLE prod.db.sample (
Expand Down Expand Up @@ -333,7 +333,7 @@ ALTER TABLE prod.db.sample DROP COLUMN point.z

## `ALTER TABLE` SQL extensions

These commands are available in Spark 3.x when using Iceberg [SQL extensions](../spark-configuration#sql-extensions).
These commands are available in Spark 3 when using Iceberg [SQL extensions](../spark-configuration#sql-extensions).

### `ALTER TABLE ... ADD PARTITION FIELD`

Expand Down
2 changes: 1 addition & 1 deletion docs/spark-procedures.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ menu:

# Spark Procedures

To use Iceberg in Spark, first configure [Spark catalogs](../spark-configuration). Stored procedures are only available when using [Iceberg SQL extensions](../spark-configuration#sql-extensions) in Spark 3.x.
To use Iceberg in Spark, first configure [Spark catalogs](../spark-configuration). Stored procedures are only available when using [Iceberg SQL extensions](../spark-configuration#sql-extensions) in Spark 3.

## Usage

Expand Down
6 changes: 3 additions & 3 deletions docs/spark-queries.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ To use Iceberg in Spark, first configure [Spark catalogs](../spark-configuration

Iceberg uses Apache Spark's DataSourceV2 API for data source and catalog implementations. Spark DSv2 is an evolving API with different levels of support in Spark versions:

| Feature support | Spark 3.0| Spark 2.4 | Notes |
|--------------------------------------------------|----------|------------|------------------------------------------------|
| Feature support | Spark 3 | Spark 2.4 | Notes |
|--------------------------------------------------|-----------|------------|------------------------------------------------|
| [`SELECT`](#querying-with-sql) | ✔️ | | |
| [DataFrame reads](#querying-with-dataframes) | ✔️ | ✔️ | |
| [Metadata table `SELECT`](#inspecting-tables) | ✔️ | | |
Expand Down Expand Up @@ -75,7 +75,7 @@ val df = spark.table("prod.db.table")

### Catalogs with DataFrameReader

Iceberg 0.11.0 adds multi-catalog support to `DataFrameReader` in both Spark 3.x and 2.4.
Iceberg 0.11.0 adds multi-catalog support to `DataFrameReader` in both Spark 3 and 2.4.

Paths and table names can be loaded with Spark's `DataFrameReader` interface. How tables are loaded depends on how
the identifier is specified. When using `spark.read.format("iceberg").load(table)` or `spark.table(table)` the `table`
Expand Down
8 changes: 4 additions & 4 deletions docs/spark-structured-streaming.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,11 @@ menu:
Iceberg uses Apache Spark's DataSourceV2 API for data source and catalog implementations. Spark DSv2 is an evolving API
with different levels of support in Spark versions.

As of Spark 3.0, DataFrame reads and writes are supported.
As of Spark 3, DataFrame reads and writes are supported.

| Feature support | Spark 3.0| Spark 2.4 | Notes |
|--------------------------------------------------|----------|------------|------------------------------------------------|
| [DataFrame write](#streaming-writes) ||| |
| Feature support | Spark 3 | Spark 2.4 | Notes |
|--------------------------------------------------|-----------|------------|------------------------------------------------|
| [DataFrame write](#streaming-writes) | || |

## Streaming Reads

Expand Down
Loading

0 comments on commit 396c6be

Please sign in to comment.