Skip to content

Docs/getting-started: Switch to use iceberg-aws-bundle jar #1609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion getting-started/eclipselink/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ services:
retries: 15
command: [
/opt/spark/bin/spark-sql,
--packages, "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,software.amazon.awssdk:bundle:2.28.17,software.amazon.awssdk:url-connection-client:2.28.17,org.apache.iceberg:iceberg-gcp-bundle:1.9.0,org.apache.iceberg:iceberg-azure-bundle:1.9.0",
--packages, "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0,org.apache.iceberg:iceberg-gcp-bundle:1.9.0,org.apache.iceberg:iceberg-azure-bundle:1.9.0",
--conf, "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
--conf, "spark.sql.catalog.quickstart_catalog=org.apache.iceberg.spark.SparkCatalog",
--conf, "spark.sql.catalog.quickstart_catalog.type=rest",
Expand Down
2 changes: 1 addition & 1 deletion getting-started/jdbc/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ services:
retries: 15
command: [
/opt/spark/bin/spark-sql,
--packages, "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,software.amazon.awssdk:bundle:2.28.17,software.amazon.awssdk:url-connection-client:2.28.17,org.apache.iceberg:iceberg-gcp-bundle:1.9.0,org.apache.iceberg:iceberg-azure-bundle:1.9.0",
--packages, "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0,org.apache.iceberg:iceberg-gcp-bundle:1.9.0,org.apache.iceberg:iceberg-azure-bundle:1.9.0",
--conf, "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
--conf, "spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog",
--conf, "spark.sql.catalog.polaris.type=rest",
Expand Down
2 changes: 1 addition & 1 deletion getting-started/spark/notebooks/SparkPolaris.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@
"\n",
"spark = (SparkSession.builder\n",
" .config(\"spark.sql.catalog.spark_catalog\", \"org.apache.iceberg.spark.SparkSessionCatalog\")\n",
" .config(\"spark.jars.packages\", \"org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.hadoop:hadoop-aws:3.4.0,software.amazon.awssdk:bundle:2.23.19,software.amazon.awssdk:url-connection-client:2.23.19\")\n",
" .config(\"spark.jars.packages\", \"org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0\")\n",
" .config('spark.sql.iceberg.vectorization.enabled', 'false')\n",
" \n",
" # Configure the 'polaris' catalog as an Iceberg rest catalog\n",
Expand Down
10 changes: 5 additions & 5 deletions plugins/spark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ and depends on iceberg-spark-runtime 1.9.0.
# Build Plugin Jar
A task createPolarisSparkJar is added to build a jar for the Polaris Spark plugin, the jar is named as:
`polaris-iceberg-<icebergVersion>-spark-runtime-<sparkVersion>_<scalaVersion>-<polarisVersion>.jar`. For example:
`polaris-iceberg-1.8.1-spark-runtime-3.5_2.12-0.10.0-beta-incubating-SNAPSHOT.jar`.
`polaris-iceberg-1.9.0-spark-runtime-3.5_2.12-0.10.0-beta-incubating-SNAPSHOT.jar`.

- `./gradlew :polaris-spark-3.5_2.12:createPolarisSparkJar` -- build jar for Spark 3.5 with Scala version 2.12.
- `./gradlew :polaris-spark-3.5_2.13:createPolarisSparkJar` -- build jar for Spark 3.5 with Scala version 2.13.
Expand All @@ -53,7 +53,7 @@ jar, and to use the local Polaris server as a Catalog.
```shell
bin/spark-shell \
--jars <path-to-spark-client-jar> \
--packages org.apache.hadoop:hadoop-aws:3.4.0,io.delta:delta-spark_2.12:3.3.1 \
--packages org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
--conf spark.sql.catalog.<catalog-name>.warehouse=<catalog-name> \
Expand All @@ -67,13 +67,13 @@ bin/spark-shell \
```

Assume the path to the built Spark client jar is
`/polaris/plugins/spark/v3.5/spark/build/2.12/libs/polaris-iceberg-1.8.1-spark-runtime-3.5_2.12-0.10.0-beta-incubating-SNAPSHOT.jar`
`/polaris/plugins/spark/v3.5/spark/build/2.12/libs/polaris-iceberg-1.9.0-spark-runtime-3.5_2.12-0.10.0-beta-incubating-SNAPSHOT.jar`
and the name of the catalog is `polaris`. The cli command will look like following:

```shell
bin/spark-shell \
--jars /polaris/plugins/spark/v3.5/spark/build/2.12/libs/polaris-iceberg-1.8.1-spark-runtime-3.5_2.12-0.10.0-beta-incubating-SNAPSHOT.jar \
--packages org.apache.hadoop:hadoop-aws:3.4.0,io.delta:delta-spark_2.12:3.3.1 \
--jars /polaris/plugins/spark/v3.5/spark/build/2.12/libs/polaris-iceberg-1.9.0-spark-runtime-3.5_2.12-0.10.0-beta-incubating-SNAPSHOT.jar \
--packages org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
--conf spark.sql.catalog.polaris.warehouse=<catalog-name> \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -266,8 +266,8 @@
"from pyspark.sql import SparkSession\n",
"\n",
"spark = (SparkSession.builder\n",
" .config(\"spark.jars\", \"../polaris_libs/polaris-iceberg-1.8.1-spark-runtime-3.5_2.12-0.11.0-beta-incubating-SNAPSHOT.jar\")\n",
" .config(\"spark.jars.packages\", \"org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-spark_2.12:3.2.1\")\n",
" .config(\"spark.jars\", \"../polaris_libs/polaris-iceberg-1.9.0-spark-runtime-3.5_2.12-0.11.0-beta-incubating-SNAPSHOT.jar\")\n",
" .config(\"spark.jars.packages\", \"org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.2.1\")\n",
" .config(\"spark.sql.catalog.spark_catalog\", \"org.apache.spark.sql.delta.catalog.DeltaCatalog\")\n",
" .config('spark.sql.iceberg.vectorization.enabled', 'false')\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion regtests/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ else
cat << EOF >> ${SPARK_CONF}

# POLARIS_TESTCONF_V5
spark.jars.packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:${ICEBERG_VERSION},org.apache.hadoop:hadoop-aws:3.4.0,software.amazon.awssdk:bundle:2.23.19,software.amazon.awssdk:url-connection-client:2.23.19
spark.jars.packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:${ICEBERG_VERSION},org.apache.iceberg:iceberg-aws-bundle:${ICEBERG_VERSION}
spark.hadoop.fs.s3.impl org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.AbstractFileSystem.s3.impl org.apache.hadoop.fs.s3a.S3A
spark.sql.variable.substitute true
Expand Down
4 changes: 1 addition & 3 deletions regtests/t_pyspark/src/iceberg_spark.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,7 @@ def __enter__(self):
"""
packages = [
"org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0",
"org.apache.hadoop:hadoop-aws:3.4.0",
"software.amazon.awssdk:bundle:2.23.19",
"software.amazon.awssdk:url-connection-client:2.23.19",
"org.apache.iceberg:iceberg-aws-bundle:1.9.0",
]
excludes = ["org.checkerframework:checker-qual", "com.google.errorprone:error_prone_annotations"]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ _Note: the credentials provided here are those for our principal, not the root c

```shell
bin/spark-sql \
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.hadoop:hadoop-aws:3.4.0 \
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.quickstart_catalog.warehouse=quickstart_catalog \
--conf spark.sql.catalog.quickstart_catalog.header.X-Iceberg-Access-Delegation=vended-credentials \
Expand All @@ -170,7 +170,7 @@ bin/spark-sql \

Similar to the CLI commands above, this configures Spark to use the Polaris running at `localhost:8181`. If your Polaris server is running elsewhere, but sure to update the configuration appropriately.

Finally, note that we include the `hadoop-aws` package here. If your table is using a different filesystem, be sure to include the appropriate dependency.
Finally, note that we include the `iceberg-aws-bundle` package here. If your table is using a different filesystem, be sure to include the appropriate dependency.

#### Using Spark SQL from a Docker container

Expand Down
4 changes: 2 additions & 2 deletions site/content/in-dev/unreleased/polaris-spark-client.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ a released Polaris Spark client.

```shell
bin/spark-shell \
--packages <polaris-spark-client-package>,org.apache.hadoop:hadoop-aws:3.4.0,io.delta:delta-spark_2.12:3.3.1 \
--packages <polaris-spark-client-package>,org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
--conf spark.sql.catalog.<spark-catalog-name>.warehouse=<polaris-catalog-name> \
Expand Down Expand Up @@ -88,7 +88,7 @@ You can also start the connection by programmatically initialize a SparkSession,
from pyspark.sql import SparkSession

spark = SparkSession.builder
.config("spark.jars.packages", "<polaris-spark-client-package>,org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-spark_2.12:3.3.1")
.config("spark.jars.packages", "<polaris-spark-client-package>,org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1")
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension")
.config("spark.sql.catalog.<spark-catalog-name>", "org.apache.polaris.spark.SparkCatalog")
Expand Down