Description
Apache Iceberg version
1.8.1 (latest release)
Query engine
Spark
Please describe the bug 🐞
Description
When executing a sequence of deletes (position deletes followed by equality deletes and a final row deletion) in Copy-on-Write (COW) mode, the equality deletes are not applied to the original data files, resulting in residual data that should have been removed.
Observed Context:
- The issue does not occur when combining
UPDATE
andMERGE
operations in COW mode – these work as expected. - The problem is specific to COW; Merge-on-Read (MOR) mode handles the same scenario correctly.
Steps to Reproduce
1) Data Setup:
-
Create two Parquet data files:
-
data-file-1.parquet
: IDs[1, 2, 3, 4, 5]
-
data-file-2.parquet
: IDs[6, 7, 8, 9, 10]
-
-
Configure the table with COW semantics (
write.delete.mode = copy-on-write
).
2) Apply Initial Deletes:
-
Add a position delete file to remove:
-
Row 0 (ID
1
) fromdata-file-1.parquet
-
Row 0 (ID
6
) fromdata-file-2.parquet
-
-
Add an equality delete file targeting IDs [
3, 4, 5, 6, 7, 8, 9, 10
].
3) Execute Final Delete Command:
DELETE FROM table WHERE id = 2; -- Targets remaining ID '2'
Expected Result
After all deletions:
-
SELECT * FROM table
should return no rows, as:-
Position deletes remove IDs
1
and6
. -
Equality deletes remove IDs
3, 4, 5
(fromdata-file-1
) and7, 8, 9, 10
(fromdata-file-2
). -
Final
DELETE WHERE id = 2
removes the last remaining ID (2
).
-
Actual Result
SELECT * FROM table
returns IDs 3, 4, 5
.
- Observed Issues:
The equality deletes targeting 3, 4, 5
(in data-file-1
) are not applied.
The DELETE WHERE id = 2
operation only removes ID 2
, leaving 3, 4, 5
intact.
Environment
Apache Iceberg Versions: 1.6.1
, 1.8.1
Tests
Example of tests:
1.8.x...and124578963:iceberg:1.8.x
To run:
./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:test --tests "org.apache.iceberg.spark.TestSparkExecutionWithEqualityAndPositionDeletes"
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time
Activity