Description
Describe the bug
We followed https://github.com/AlexMercedCoder/apache-polaris-learing-environment to bring up the Polaris catalog in one of our VM.
Created the catalogs, schemas and iceberg tables . As part of one of the operation we need to drop the table created in the Polaris catalog.
We are using a standalone SPARK application to try the options available where I have come across an issue with SPARK with Rest catalog.
Using the following code tried to drop the table
Expectations :
- Drop the table from the Polaris catalog.
- delete the metadata file from the storage (S3)
- delete the data files from the storage (S3).
Observations
1.Dropped the table from the Polaris catalog.
2.deleted the data files from the storage (S3).
Metadata file from the storage is not deleted.
val catalog = spark.sessionState.catalogManager.catalog("dev_catalog").asInstanceOf[SparkCatalog]
val idnt = TableIdentifier.of("organization","finance")
catalog.icebergCatalog().dropTable(idnt,true)
To Reproduce
Spark Application :
import org.apache.spark.sql.{ Row, Column, DataFrame, SaveMode, SparkSession ,Dataset}
//import software.amazon.awssdk.regions.Region
import scala.collection.mutable.HashMap
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkContext
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import java.net.URI
import org.apache.hadoop.conf.Configuration
import software.amazon.awssdk.services.sts.StsClient
import software.amazon.awssdk.services.sts.model.AssumeRoleRequest
import software.amazon.awssdk.services.sts.StsClient
import org.apache.iceberg.spark.SparkCatalog
import org.apache.iceberg.catalog.TableIdentifier;
import scala.Array
import org.apache.iceberg.rest.RESTCatalog
import org.apache.iceberg.spark.SparkCatalog
object ec2_check2_delete_stagingfile2{
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.master("local[*]")
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
.config("spark.sql.catalog.dev_catalog", "org.apache.iceberg.spark.SparkCatalog")
.config("spark.sql.catalog.dev_catalog.catalog-impl","org.apache.iceberg.rest.RESTCatalog")
.config("spark.sql.catalog.dev_catalog.uri","http://*************:8181/api/catalog")
.config("spark.sql.catalog.dev_catalog.header.X-Iceberg-Access-Delegation", true)
.config("spark.sql.catalog.dev_catalog.header.X-Iceberg-Access-Delegation","vended-credentials")
.config("spark.sql.catalog.dev_catalog.credential","*****:******")
.config("spark.sql.catalog.dev_catalog.client.region","******")
.config("spark.sql.catalog.dev_catalog.warehouse","dev_catalog")
.config("spark.sql.catalog.dev_catalog.scope","*****")
.config("spark.sql.catalog.dev_catalog.token-refresh-enabled", true)
.config("spark.sql.debug.codegen", true)
.getOrCreate();
print("Spark Running")
val catalog = spark.sessionState.catalogManager.catalog("dev_catalog").asInstanceOf[SparkCatalog]
val idnt = TableIdentifier.of("organization","finance")
catalog.icebergCatalog().dropTable(idnt,true)
print("done")
spark.stop();
}
}
Actual Behavior
1.Dropped the table from the Polaris catalog.
2.deleted the data files from the storage (S3).
Expected Behavior
- Drop the table from the Polaris catalog.
- delete the metadata file from the storage (S3)
- delete the data files from the storage (S3).
Additional context
No response
System information
Dependencies :
- iceberg-aws-bundle-1.4.3
- iceberg-spark-runtime-3.3_2.12-1.4.3
- log4j-slf4j-impl-2.17.2
- iceberg-hive-runtime-1.6.1
Spark Version : 3.3.1
Scala Version : 2.12