-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Apache Iceberg version
1.7.2
Query engine
Spark
Please describe the bug 🐞
Describe bug
When I try this commandCREATE OR REPLACE TABLE AS SELECT, it works well for several cases including:
- Replace table with structure only (I try to limit 0 in my select statement)
- Replace table from non-partition table to partition table
- Replace table from identity partition column to another identity partition column
However, there's one case I've tested that was unsuccessful: attempting to replace a table with a transform partitiontion to non-partitioned.
Step reproduce
Firstly, I created an empty non-partitioned table using the CREATE TABLE command, followed by inserting test data. Next, I created another partitioned table with a different structure using the same command as in the first step.
Then, I tried to replace the existing partitioned table using the CREATE OR REPLACE TABLE AS SELECT command along with my non-partitioned table. My expectation was that the table would be completely replaced, including both the schema and data, with the non-partitioned table successfully. However, it didn't work and raised org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 24.0 failed 1 times, most recent failure: Lost task 0.0 in stage 24.0 (TID 50) (10.16.14.4 executor driver): org.apache.iceberg.exceptions.ValidationException: Cannot find source column for partition field: 1000: id_bucket: bucket[16](1)
Okay, I tried one more time. Before replacing the table, I dropped all partition columns using ALTER TABLE DROP PARTITION FIELD. However, it still resulted in the same errors. I don't understand why it is still checking for partition fields when I had already dropped them before replacing the table.(I've already checked metadata file, and there's no partition spec now as I dropped before)
My code: Spark + Rest Catalog(catalog_name = 'lakekeeper')
# Create non-partition table
spark.sql("""
CREATE TABLE IF NOT EXISTS lakekeeper.risk_mon_out.my_data_non_partitioned(
vendor_id int,
product string,
price double,
event_time timestamp
)
""")
spark.sql("""
INSERT INTO lakekeeper.risk_mon_out.my_data_non_partitioned VALUES
(1, "Car", 2.34, TIMESTAMP '2025-03-01 11:03:01'),
(2, "Bike", 5.43, TIMESTAMP '2025-03-01 11:19:21'),
(3, "House", 100.4, TIMESTAMP '2025-03-01 12:43:17'),
(4, "Plane", 124.4, TIMESTAMP '2025-03-01 13:05:29')
""")
spark.sql("SELECT * FROM lakekeeper.risk_mon_out.my_data_non_partitioned").show()
# Create partition table with transformation
spark.sql("""
CREATE TABLE lakekeeper.risk_mon_out.data_sample_test(
id int,
category string,
event_time timestamp
)
PARTITIONED BY (
bucket(16, id),
category
)
""")
spark.sql("""
INSERT INTO lakekeeper.risk_mon_out.data_sample_test VALUES
(5, "HOVal", TIMESTAMP '2025-03-01 11:03:09'),
(6, "NextMi", TIMESTAMP '2025-03-01 12:19:29'),
(7, "OGel", TIMESTAMP '2025-01-09 06:43:17'),
(8, "AAAA", TIMESTAMP '2025-01-09 06:05:29')
""")
spark.sql("SELECT * FROM lakekeeper.risk_mon_out.data_sample_test").show()
# Replace partition table to non-partition table
try:
spark.sql("""
CREATE OR REPLACE TABLE lakekeeper.risk_mon_out.data_sample_test
USING iceberg
AS SELECT * FROM lakekeeper.risk_mon_out.my_data_non_partitioned
""")
except Exception as e:
print(f"Error 111: {e}")
try:
spark.sql("ALTER TABLE lakekeeper.risk_mon_out.data_sample_test DROP PARTITION FIELD id_bucket")
spark.sql("ALTER TABLE lakekeeper.risk_mon_out.data_sample_test DROP PARTITION FIELD category")
print("Drop done")
spark.sql("""
CREATE OR REPLACE TABLE lakekeeper.risk_mon_out.data_sample_test
USING iceberg
AS SELECT * FROM lakekeeper.risk_mon_out.my_data_non_partitioned
""")
except Exception as e:
print("=====")
print(f"Error 222: {e}")Expected Behavior
Table will be replaced entirely with both schema and data.
Environment
Apache Iceberg Version: 1.7.2
Execution Engines Tested: Spark 3.5.2
Catalog Type: REST catalog
Storage: S3-compatible
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time


