Skip to content

Data Integrity Issue with DELETE Operation Using Copy-on-Write (COW) and Equality Deletes #12467

Open
@and124578963

Description

@and124578963

Apache Iceberg version

1.8.1 (latest release)

Query engine

Spark

Please describe the bug 🐞

Description

When executing a sequence of deletes (position deletes followed by equality deletes and a final row deletion) in Copy-on-Write (COW) mode, the equality deletes are not applied to the original data files, resulting in residual data that should have been removed.

Observed Context:

  • The issue does not occur when combining UPDATE and MERGE operations in COW mode – these work as expected.
  • The problem is specific to COW; Merge-on-Read (MOR) mode handles the same scenario correctly.

Steps to Reproduce

1) Data Setup:

  • Create two Parquet data files:

    • data-file-1.parquet: IDs [1, 2, 3, 4, 5]

    • data-file-2.parquet: IDs [6, 7, 8, 9, 10]

  • Configure the table with COW semantics (write.delete.mode = copy-on-write).

2) Apply Initial Deletes:

  • Add a position delete file to remove:

    • Row 0 (ID 1) from data-file-1.parquet

    • Row 0 (ID 6) from data-file-2.parquet

  • Add an equality delete file targeting IDs [3, 4, 5, 6, 7, 8, 9, 10].

3) Execute Final Delete Command:

DELETE FROM table WHERE id = 2; -- Targets remaining ID '2'

Expected Result

After all deletions:

  • SELECT * FROM table should return no rows, as:

    • Position deletes remove IDs 1 and 6.

    • Equality deletes remove IDs 3, 4, 5 (from data-file-1) and 7, 8, 9, 10 (from data-file-2).

    • Final DELETE WHERE id = 2 removes the last remaining ID (2).

Actual Result

SELECT * FROM table returns IDs 3, 4, 5.

  • Observed Issues:

The equality deletes targeting 3, 4, 5 (in data-file-1) are not applied.

The DELETE WHERE id = 2 operation only removes ID 2, leaving 3, 4, 5 intact.

Environment

Apache Iceberg Versions: 1.6.1, 1.8.1

Tests

Example of tests:
1.8.x...and124578963:iceberg:1.8.x
To run:
./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:test --tests "org.apache.iceberg.spark.TestSparkExecutionWithEqualityAndPositionDeletes"

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions