Skip to content

dropTable() API with Purge option as true is dropping the table in Polaris Rest catalog and deleted only the data files and not the metadata files in the storage(AWS S3) #1448

Open
@anushshetty3010

Description

@anushshetty3010

Describe the bug

We followed https://github.com/AlexMercedCoder/apache-polaris-learing-environment to bring up the Polaris catalog in one of our VM.

Created the catalogs, schemas and iceberg tables . As part of one of the operation we need to drop the table created in the Polaris catalog.

We are using a standalone SPARK application to try the options available where I have come across an issue with SPARK with Rest catalog.

Using the following code tried to drop the table

Expectations :

  1. Drop the table from the Polaris catalog.
  2. delete the metadata file from the storage (S3)
  3. delete the data files from the storage (S3).

Observations
1.Dropped the table from the Polaris catalog.
2.deleted the data files from the storage (S3).

Metadata file from the storage is not deleted.

val catalog = spark.sessionState.catalogManager.catalog("dev_catalog").asInstanceOf[SparkCatalog]
val idnt = TableIdentifier.of("organization","finance")
catalog.icebergCatalog().dropTable(idnt,true)

To Reproduce

Spark Application :

import org.apache.spark.sql.{ Row, Column, DataFrame, SaveMode, SparkSession ,Dataset}
//import software.amazon.awssdk.regions.Region
import scala.collection.mutable.HashMap
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkContext
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import java.net.URI
import org.apache.hadoop.conf.Configuration
import software.amazon.awssdk.services.sts.StsClient
import software.amazon.awssdk.services.sts.model.AssumeRoleRequest
import software.amazon.awssdk.services.sts.StsClient
import org.apache.iceberg.spark.SparkCatalog
import org.apache.iceberg.catalog.TableIdentifier;

import scala.Array
import org.apache.iceberg.rest.RESTCatalog
import org.apache.iceberg.spark.SparkCatalog

object ec2_check2_delete_stagingfile2{

def main(args: Array[String]): Unit = {
		val spark = SparkSession
				.builder()
				.master("local[*]")	
				.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
				.config("spark.sql.catalog.dev_catalog", "org.apache.iceberg.spark.SparkCatalog") 
				.config("spark.sql.catalog.dev_catalog.catalog-impl","org.apache.iceberg.rest.RESTCatalog")
				.config("spark.sql.catalog.dev_catalog.uri","http://*************:8181/api/catalog") 
				.config("spark.sql.catalog.dev_catalog.header.X-Iceberg-Access-Delegation", true)
				.config("spark.sql.catalog.dev_catalog.header.X-Iceberg-Access-Delegation","vended-credentials") 
				.config("spark.sql.catalog.dev_catalog.credential","*****:******")
				.config("spark.sql.catalog.dev_catalog.client.region","******")
				.config("spark.sql.catalog.dev_catalog.warehouse","dev_catalog") 
				.config("spark.sql.catalog.dev_catalog.scope","*****")
				.config("spark.sql.catalog.dev_catalog.token-refresh-enabled", true)
				.config("spark.sql.debug.codegen", true)
				.getOrCreate();

		    print("Spark Running")
		    
	    val catalog  = spark.sessionState.catalogManager.catalog("dev_catalog").asInstanceOf[SparkCatalog]
		  val idnt = TableIdentifier.of("organization","finance")
		  catalog.icebergCatalog().dropTable(idnt,true)		 
		    
       print("done")
   
       spark.stop();
		    
		    
}

}

Actual Behavior

1.Dropped the table from the Polaris catalog.
2.deleted the data files from the storage (S3).

Expected Behavior

  1. Drop the table from the Polaris catalog.
  2. delete the metadata file from the storage (S3)
  3. delete the data files from the storage (S3).

Additional context

No response

System information

Dependencies :

  1. iceberg-aws-bundle-1.4.3
  2. iceberg-spark-runtime-3.3_2.12-1.4.3
  3. log4j-slf4j-impl-2.17.2
  4. iceberg-hive-runtime-1.6.1

Spark Version : 3.3.1
Scala Version : 2.12

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions