HIVE-29437: Iceberg: Fix concurrency issues between compaction and co… by difin · Pull Request #6292 · apache/hive

difin · 2026-02-03T18:33:07Z

…ncurrent write operations.

What changes were proposed in this pull request?

Fixing concurrency issues between compaction and concurrent write operations.

Why are the changes needed?

It was found in downstream testing that when Hive Iceberg compaction is running in parallel to Spark write operations on the same table, compaction sometimes produces wrong results. Before committing, when Hive already has the compacted data files that need to replace existing, uncompacted data and delete files in a table or partition, it collects those uncompacted data and delete files to replace them with the compacted files. The issue is that Hive collects those uncompacted data and delete files from the latest Iceberg snapshot instead of using the original snapshot. The latest snapshot may contain different data because of concurrent write operations, which can lead to data corruption.

Does this PR introduce any user-facing change?

No

How was this patch tested?

The fix was validated downstream with concurrent Spark write operations and Hive Iceberg compaction.

…ncurrent write operations.

deniskuzZ · 2026-02-03T19:39:14Z

...erg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java

+        IcebergCompactionUtil.getDataFiles(table, snapshotId, partitionPath, fileSizeThreshold);
    List<DeleteFile> existingDeleteFiles = fileSizeThreshold == -1 ?
-        IcebergCompactionUtil.getDeleteFiles(table, partitionPath) : Collections.emptyList();
+        IcebergCompactionUtil.getDeleteFiles(table, snapshotId, partitionPath) : Collections.emptyList();


please add test.
as an example you could use TestConflictingDataFiles#testConflictingUpdateAndDelete

Working on it.

deniskuzZ · 2026-02-03T19:41:17Z

...eberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergCompactionUtil.java

    Table deletesTable =
        MetadataTableUtils.createMetadataTableInstance(table, MetadataTableType.POSITION_DELETES);
-    CloseableIterable<ScanTask> deletesScanTasks = deletesTable.newBatchScan().planFiles();
+    CloseableIterable<ScanTask> deletesScanTasks = deletesTable.newBatchScan().useSnapshot(snapshotId).planFiles();


why do you use here newBatchScan() and in getDataFiles newScan()? should we use BatchScan in both places?

It was in the existing code, changed to newScan().
Changed to use BatchScan in both places.

sonarqubecloud · 2026-02-04T22:21:26Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

HIVE-29437: Iceberg: Fix concurrency issues between compaction and co…

dabefad

…ncurrent write operations.

asf-ci-hive added the tests pending label Feb 3, 2026

difin requested a review from deniskuzZ February 3, 2026 18:47

deniskuzZ reviewed Feb 3, 2026

View reviewed changes

asf-ci-hive added tests passed tests pending tests failed and removed tests pending tests passed labels Feb 3, 2026

code review comments - Feb 4.

5433c78

difin force-pushed the iceberg_compaction_concurrency_fix branch from b842441 to 5433c78 Compare February 4, 2026 21:04

asf-ci-hive added tests pending and removed tests failed labels Feb 4, 2026

asf-ci-hive added tests unstable and removed tests pending labels Feb 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIVE-29437: Iceberg: Fix concurrency issues between compaction and co…#6292

HIVE-29437: Iceberg: Fix concurrency issues between compaction and co…#6292
difin wants to merge 2 commits intoapache:masterfrom
difin:iceberg_compaction_concurrency_fix

difin commented Feb 3, 2026 •

edited

Loading

Uh oh!

deniskuzZ Feb 3, 2026 •

edited

Loading

Uh oh!

difin Feb 4, 2026

Uh oh!

deniskuzZ Feb 3, 2026 •

edited

Loading

Uh oh!

difin Feb 4, 2026 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

difin commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

deniskuzZ Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

difin Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

difin Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Feb 4, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

difin commented Feb 3, 2026 •

edited

Loading

deniskuzZ Feb 3, 2026 •

edited

Loading

deniskuzZ Feb 3, 2026 •

edited

Loading

difin Feb 4, 2026 •

edited

Loading