Core: Validate conflicting delete files in RowDelta and OverwriteFiles #3069

aokolnychyi · 2021-09-03T18:40:06Z

This PR adds validation for concurrently added delete files in RowDelta and OverwriteFiles. Previously, if we had a conflicting row delta operation, we would ignore it and corrupt the table.

api/src/main/java/org/apache/iceberg/OverwriteFiles.java

api/src/main/java/org/apache/iceberg/RowDelta.java

aokolnychyi · 2021-09-03T18:45:16Z

core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java

@@ -64,7 +64,7 @@
  private static final Set<String> VALIDATE_DATA_FILES_EXIST_SKIP_DELETE_OPERATIONS =
      ImmutableSet.of(DataOperations.OVERWRITE, DataOperations.REPLACE);
  // delete files can be added in "overwrite" or "delete" operations
-  private static final Set<String> VALIDATE_REPLACED_DATA_FILES_OPERATIONS =
+  private static final Set<String> VALIDATE_ADDED_DELETE_FILES_OPERATIONS =


I renamed it to match VALIDATE_ADDED_FILES_OPERATIONS .

aokolnychyi · 2021-09-03T18:51:55Z

cc @openinx @rdblue @RussellSpitzer @jackye1995 @yyanyy @flyrain @kbendick @karuppayya @raptond

api/src/main/java/org/apache/iceberg/OverwriteFiles.java

core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java

rdblue · 2021-09-03T23:43:06Z

api/src/main/java/org/apache/iceberg/OverwriteFiles.java

@@ -122,27 +122,30 @@
   *
   * @param conflictDetectionFilter an expression on rows in the table
   * @return this for method chaining
+   * @deprecated this will be removed in 0.14.0;
+   *             use {@link #validateNoConflictingOperations(Expression)} instead


Do we also need to update ReplacePartitions?

Let me check during the weekend

I am not sure we would do anything differently in ReplacePartitions if anyone commits delete files concurrently. Any particular thoughts you had in mind, @rdblue?

@szehon-ho is adding support for detecting conflicts in ReplacePartitions. The original behavior was basically snapshot isolation like Hive. But you could argue that it would be valuable to support serializable by ensuring the replaced partitions haven't changed since some starting snapshot. This is the PR: #2925

I forgot about that PR. I'll take a look on Monday.

core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java

aokolnychyi · 2021-09-06T05:31:06Z

core/src/main/java/org/apache/iceberg/BaseRowDelta.java

      if (conflictDetectionFilter != null) {
        validateAddedDataFiles(base, startingSnapshotId, conflictDetectionFilter, caseSensitive);
      }
+
+      if (conflictDetectionFilter != null && validateNoConflictingDeleteFiles) {
+        validateAddedDeleteFiles(base, startingSnapshotId, conflictDetectionFilter, caseSensitive);


I won't be able to use validateNoConflictingAppends here as RowDelta works with paths only.
What about using DataFile in validateDataFilesExist, @rdblue?
Any particular reasons for using paths only?

I don't think we will be able to use DataFile for RowDelta that easily as we are using a regular scan and delete writers just give us referenced data file locations. Then I'd say we probably still need the new method here.

If I understand correctly, the motivation for updating RowDelta is the case where we have two concurrent delta commits? So an UPDATE and a MERGE at the same time might both rewrite a row, which could cause a duplicate:

INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c'); -- running these concurrently causes a problem UPDATE t SET data = 'x' WHERE id = 1; UPDATE t SET data = 'y' WHERE id = 1;

If I ran the updates concurrently, both would delete id=1 and both would add a new file with (1, 'x') and (1, 'y') right?

The validation here is that the file created by the initial insert doesn't have any new delete files written against it. It seems like we want to just call validateNoNewDeletesForDataFiles and pass referencedFiles in, right? Maybe I'm missing something?

We might want to make this a separate issue to keep changes smaller and reviews easier.

We only know locations of referenced files. We need actual DataFile objects to leverage the delete file index. My question is why we went with locations only in RowDelta and whether we should switch to DataFile objects instead.

Well, I think we could approach it differently. Let me update and then we can discuss more.

The reason why we only have file locations in RowDelta is that we get the set of referenced files from writing position deletes, which are (location, position). We don't require carrying DataFile through delta operations.

We could probably wrap the location in a dummy DataFile, or add a way to query the DeleteFileIndex by location.

In fact, if we create a DataFile, we can copy the column stats from the DeleteFile so that we can do stats overlap comparisons.

I had a chance to think more about the required validation. While using DataFiles instead of locations would give us min/max filtering, we can probably do better than that if all delete files are position-based.

We can do something like this if we are working with position-based delete files:

Read all position deletes we are trying to commit and build a map with file location -> a list of deleted positions.

Iterate through concurrently added delete files applying min/max filtering (secondary indexes in the future) and find which may have conflicts.

Read files that potentially conflict and verify that (file, pos) pairs don't overlap.

We may need to add some limit on the overall size of deletes we can scan but we can figure that out.
Overall, this will allow us to resolve conflicts within the same partition.

openinx · 2021-09-06T09:58:15Z

api/src/main/java/org/apache/iceberg/OverwriteFiles.java

   */
-  @Deprecated
-  OverwriteFiles validateNoConflictingAppends(Long readSnapshotId, Expression conflictDetectionFilter);
+  OverwriteFiles validateNoConflictingOperations(Expression conflictDetectionFilter);


Looks like this method is designed to validate whether the copy-on-write batch overwrite operations will be conflicted with the RowDelta operations ? I will need some time to read the whole copy-on-write code path.

Okay, any operations adding the data/delete files that match the scan files from copy-on-write should be conflicted.

openinx · 2021-09-06T11:27:02Z

core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java

@@ -129,6 +119,7 @@ protected void validate(TableMetadata base) {

    if (conflictDetectionFilter != null && base.currentSnapshot() != null) {
      validateAddedDataFiles(base, startingSnapshotId, conflictDetectionFilter, caseSensitive);


Q: Should the OverwriteFiles also conflict with the REPLACE operation when checking the added data files in validateAddedDataFiles ? Take the following example:

The table has 2 data files: FILE_A, FILE_B;

Start the CopyOnWrite to delete all rows in FILE_A. Let's say txn1, it's just started but not committed;

Someone start the REPLACE txn to rewrite the FILE_A + FILE_B to FILE_C. Let's say txn2 , started but not committed;

Committed the txn2;

Committed the txn1

Finally, we will see all the rows from FILE_A again because the REPLACE operation has added them to table again.

I raise this question because I see the validateAddedDataFiles only check the APPEND & OVERWRITE operations ( it's VALIDATE_ADDED_FILES_OPERATIONS ). So REPLACE operations should be also considered ?

This use case is already covered as we always guarantee all files we overwrite in OverwriteFiles still exist. It is literally new records (added, not rewritten) we trying to catch here.

In the example above, txn1 will fail as FILE_A is not there.

rdblue · 2021-09-09T23:29:53Z

core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java

+   * @param conflictDetectionFilter an expression used to find new conflicting delete files
+   * @param caseSensitive whether expression evaluation should be case sensitive
+   */
+  protected void validateAddedDeleteFiles(TableMetadata base, Long startingSnapshotId,


Could you update this to validateNoAddedDeleteFiles or something similar? It isn't clear from the method name that this is checking that there aren't any. We may want to rename validateAddedDataFiles as well.

I went for validateNoNewDeletes to match validateNoNewDeletesForDataFiles.

core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java

aokolnychyi · 2021-09-13T19:08:36Z

core/src/main/java/org/apache/iceberg/BaseRowDelta.java

+      }
+
+      if (deleteConflictDetectionFilter != null) {
+        validateNoNewDeletes(base, startingSnapshotId, deleteConflictDetectionFilter, caseSensitive);


This check is quite trivial. For example, we won't be able to resolve conflicts within the same partition. I have outlined a way to optimize it here.

Do you mean it cannot resolve within same data file (I thought we are passing data filter)? Or within the same partition?

And also for my learning, you mean it will be over-aggressive and report false negatives even if rows do not actually conflict, until we make the optimization.

you mean it will be over-aggressive and report false negatives even if rows do not actually conflict, until we make the optimization.

Yeah, it may report false positives. The data filter is helpful but I think it won't help much within the same partition. Position deletes are scoped to a partition so the data filter should help us when there is a concurrent delete in another partition. Within the partition, though, most of position deletes will match that row filter as we don't persist the deleted row (by default).

A bit late to the whole discussion. Regarding the check, I read the outlined way to optimize it, just want to share some thoughts based on what I am doing for position deletes of my internal distribution as of today.

In my system, each position delete file written contains exactly 1 file_path value, which avoids the requirement from the spec to sort by file path and also greatly simplifies the validation during concurrent commits, because each check can easily find all position deletes of each data file and check against just the position min max to see if there is any potential overlapping of the position range. Of course this cannot be applied to a general use case, it was implemented just to see what can be achieved with a closed system where all delete writers only write that specific type of position delete file.

When I started to compact position delete files to contain multiple file_path values, it becomes very easy to have false-positives, especially in the object storage mode where the file_path min and max does not really mean anything anymore. So at least from the object storage use case, secondary index with much better file skipping ability is a must have to make the strategy described truly work efficiently.

szehon-ho

Looks mostly great, just a few questions

szehon-ho · 2021-09-16T04:04:06Z

api/src/main/java/org/apache/iceberg/RowDelta.java

+   * @param conflictDetectionFilter an expression on rows in the table
+   * @return this for method chaining
+   */
+  RowDelta validateNoConflictingDeleteFiles(Expression conflictDetectionFilter);


Nit: should we make it a bit consistent with above? (ie, omit 'files' from the name)

I did that in the first place but then I started to worry it may be confusing. For example, we refer here to concurrently added delete files vs concurrently happened delete operations that removed data files.

I do prefer consistency too but I am not sure whether it is confusing. What do you think, @szehon-ho?

Yes fine with me then, thanks for clarifying

szehon-ho · 2021-09-16T04:06:01Z

core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java

+    long startingSequenceNumber = startingSequenceNumber(base, startingSnapshotId);
+    DeleteFileIndex deletes = buildDeleteFileIndex(deleteManifests, startingSequenceNumber, dataFilter, caseSensitive);
+
+    ValidationException.check(deletes.isEmpty(),


Thanks for adding this!

szehon-ho · 2021-09-16T17:47:10Z

core/src/main/java/org/apache/iceberg/BaseRowDelta.java

+      }
+
+      if (deleteConflictDetectionFilter != null) {
+        validateNoNewDeletes(base, startingSnapshotId, deleteConflictDetectionFilter, caseSensitive);


Do you mean it cannot resolve within same data file (I thought we are passing data filter)? Or within the same partition?

And also for my learning, you mean it will be over-aggressive and report false negatives even if rows do not actually conflict, until we make the optimization.

szehon-ho · 2021-09-16T17:54:31Z

core/src/main/java/org/apache/iceberg/DeleteFileIndex.java

@@ -86,6 +86,20 @@ public boolean isEmpty() {
    return (globalDeletes == null || globalDeletes.length == 0) && sortedDeletesByPartition.isEmpty();
  }

+  public List<DeleteFile> referencedDeleteFiles() {
+    List<DeleteFile> deleteFiles = Lists.newArrayList();


Optional comment: small optimization can be done by knowing the initial length, and checking isEmpty

if (isEmpty()) { return Lists.empty() else List<DeleteFile> deleteFiles = Lists.newArrayList(globalDeletes.length + sortedDeletesByPartition.length); ...

I was about to implement this but then I realized sortedDeletesByPartition.length is not accurate as each entry contains an array of delete files. In order to compute a right estimate, I'd need to iterate through the map.

core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java

rdblue · 2021-09-19T21:12:40Z

core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java

+    // we must fail this operation as it would undelete rows that were removed concurrently
+    if (deletedDataFiles.size() > 0) {
+      validateNoNewDeletesForDataFiles(
+          base, startingSnapshotId, conflictDetectionFilter, deletedDataFiles, caseSensitive);


I'm not sure that I agree that we only need to check the files that were directly deleted.

For example, you can use Spark to replace an expression using DataFrameWriterV2:

df.writeTo(t).overwrite($"date" === "2021-10-01")

Whether the deleted files should be checked for row-level deletes depends on whether the written df is a result of reading the table. If you're performing your own merge, then it should be. I realize that there's not currently a way to enable the validation, but we could support write properties for this eventually:

df.writeTo(t) .option("validate-snapshot-id", 1234L) // use serializable isolation .overwrite($"date" === "2021-10-01")

Is this something that we don't need to support right now because no one is calling it? I think we should still consider adding the validation for later.

By the way, this is a way to add serializable isolation for replacing partitions as well.

@szehon-ho ^^

Okay, if it is something we want to support, it means there are 3 cases:

Case 1: copy-on-write MERGE -> we must validate deletes.

Case 2: overwrite by filter with serializable isolation -> we must validate deletes.

Case 3: overwrite by filter with snapshot isolation -> we must NOT validate deletes.

Since delete validation becomes optional, it probably means we need a separate method to trigger it.
Thoughts, @rdblue @szehon-ho?

Basically, OverwriteFiles will now match RowDelta that has a separate method to trigger the validation.

I have updated the PR to match what I described above.

core/src/main/java/org/apache/iceberg/BaseRewriteFiles.java

rdblue

I think this is about ready, but I'd like to see an additional validation in OverwriteFiles that can run the new validateNoNewDeletes that is used in RowDelta. I gave an example case and it would be good to hear whether you agree it's valid.

aokolnychyi · 2021-09-21T01:42:24Z

Ready for another round.

aokolnychyi · 2021-09-21T01:44:38Z

spark3/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java

@@ -365,6 +365,7 @@ private void commitWithSerializableIsolation(OverwriteFiles overwriteFiles,

      Expression conflictDetectionFilter = conflictDetectionFilter();
      overwriteFiles.validateNoConflictingAppends(conflictDetectionFilter);
+      overwriteFiles.validateNoConflictingDeleteFiles(conflictDetectionFilter);


@rdblue, could you double check this place?

The check looks good to me.

szehon-ho · 2021-09-21T03:24:16Z

spark3/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java

+      }
+
+      Expression conflictDetectionFilter = conflictDetectionFilter();
+      overwriteFiles.validateNoConflictingDeleteFiles(conflictDetectionFilter);


Maybe mistaken, but why are we check for conflicting delete files in snapshot isolation? (I thought we must not check)

Consider we have a data file A and there is a copy-on-write operation that overwrites it with a data file B. If a concurrent operation adds a delete file that references records from the data file A, committing the original copy-on-write operation (i.e. overwrite) would undelete the rows that were deleted concurrently.

It seems we always have to validate the delete files whenever we overwrite specific files during DELETE/MERGE.

Got it, yea I double-checked the Postgres doc for REPEATABLE_READ (snapshot) and it makes sense, deletes and updates to same rows must not be overridden in this mode.

I agree. Deletes should always be validated.

kbendick · 2021-09-21T07:25:44Z

core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java

@@ -95,11 +101,18 @@ public OverwriteFiles caseSensitive(boolean isCaseSensitive) {
  @Override
  public OverwriteFiles validateNoConflictingAppends(Expression newConflictDetectionFilter) {
    Preconditions.checkArgument(newConflictDetectionFilter != null, "Conflict detection filter cannot be null");


Nit: Should this be Append conflict detection filter cannot be null now that we have both appendConflictDetectionFilter and delteConflictDectionFilter?

I'll update if it fits on the same line.

kbendick · 2021-09-21T07:26:32Z

core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java

    failMissingDeletePaths();
    return this;
  }

+  @Override
+  public OverwriteFiles validateNoConflictingDeleteFiles(Expression newConflictDetectionFilter) {
+    Preconditions.checkArgument(newConflictDetectionFilter != null, "Conflict detection filter cannot be null");


Nit: Same comment about saying "Delete conflict detection filter cannot be null` instead of leaving it unqualified and ambiguous.

kbendick · 2021-09-21T07:29:05Z

core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java

@@ -95,11 +101,18 @@ public OverwriteFiles caseSensitive(boolean isCaseSensitive) {
  @Override
  public OverwriteFiles validateNoConflictingAppends(Expression newConflictDetectionFilter) {
    Preconditions.checkArgument(newConflictDetectionFilter != null, "Conflict detection filter cannot be null");
-    this.conflictDetectionFilter = newConflictDetectionFilter;
+    this.appendConflictDetectionFilter = newConflictDetectionFilter;
    failMissingDeletePaths();


Question: Does this call to failMissingDeletePaths() still belong here or should it be moved to the validateNoConflictingDeleteFiles call?

I'll double check.

jackye1995 · 2021-09-24T05:20:24Z

core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java

+      Snapshot startingSnapshot = metadata.snapshot(staringSnapshotId);
+      return startingSnapshot.sequenceNumber();
+    } else {
+      return 0;


nit: can use TableMetadata.INITIAL_SEQUENCE_NUMBER and remove the comment

jackye1995 · 2021-09-24T05:28:45Z

spark3/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java

+        overwriteFiles.validateFromSnapshot(scanSnapshotId);
+      }
+
+      Expression conflictDetectionFilter = conflictDetectionFilter();


nit: can combine L384 and L385, conflictDetectionFilter only used once

jackye1995 · 2021-09-24T06:09:09Z

core/src/main/java/org/apache/iceberg/BaseRowDelta.java

+      }
+
+      if (deleteConflictDetectionFilter != null) {
+        validateNoNewDeletes(base, startingSnapshotId, deleteConflictDetectionFilter, caseSensitive);


A bit late to the whole discussion. Regarding the check, I read the outlined way to optimize it, just want to share some thoughts based on what I am doing for position deletes of my internal distribution as of today.

In my system, each position delete file written contains exactly 1 file_path value, which avoids the requirement from the spec to sort by file path and also greatly simplifies the validation during concurrent commits, because each check can easily find all position deletes of each data file and check against just the position min max to see if there is any potential overlapping of the position range. Of course this cannot be applied to a general use case, it was implemented just to see what can be achieved with a closed system where all delete writers only write that specific type of position delete file.

When I started to compact position delete files to contain multiple file_path values, it becomes very easy to have false-positives, especially in the object storage mode where the file_path min and max does not really mean anything anymore. So at least from the object storage use case, secondary index with much better file skipping ability is a must have to make the strategy described truly work efficiently.

rdblue · 2021-09-26T21:04:53Z

api/src/main/java/org/apache/iceberg/RowDelta.java

+   * <p>
+   * This method must be called when the table is queried to produce a row delta for UPDATE and
+   * MERGE operations independently of the isolation level. Calling this method isn't required
+   * for DELETE operations as it is OK when a particular record we are trying to delete


Nit: use of "we" in javadoc is unnecessary. It is simpler to say "it is OK to delete a record that is also deleted concurrently".

rdblue · 2021-09-26T21:26:31Z

core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java

+      validateAddedDataFiles(base, startingSnapshotId, appendConflictDetectionFilter, caseSensitive);
+    }
+
+    boolean validateNewDeletes = deleteConflictDetectionFilter != null && base.currentSnapshot() != null;


I think that the behavior here should be slightly different. There are two concerns: 1) whether to check delete files for snapshot isolation and 2) what conflict detection filter to use. Basing validateNewDeletes on whether the conflict detection filter was set doesn't seem correct to me.

I don't think there is a case where we don't want to validate delete files if we have called validateFromSnapshot to set the base snapshot. I think that we should add this as a boolean field that is set when validateFromSnapshot is called.

Then, if we are validating delete files, we should have two separate checks. First, if there are any files in deletedDataFiles, then we perform the validation below. If the conflict detection filter wasn't set, then we should use Expressions.alwaysTrue to find candidate delete files. Second, if an overwrite filter was set, then we should run validateNoNewDeletes with either the delete filter or the delete conflict detection filter. The conflict detection filter should be an optimization, not a way to turn off delete validations.

I think that makes the API more understandable and consistent.

Here's what I changed this to locally while thinking through it:

// validateDeletes is set to true in validateFromSnapshot. Maybe we should default it if that method isn't called? if (validateDeletes) { if (deletedDataFiles.size() > 0) { validateNoNewDeletesForDataFiles( base, startingSnapshotId, deleteConflictDetectionFilter, deletedDataFiles, caseSensitive); } if (rowFilter() != Expressions.alwaysFalse()) { if (deleteConflictDetectionFilter != null) { validateNoNewDeletes(base, startingSnapshotId, deleteConflictDetectionFilter, caseSensitive); } else { validateNoNewDeletes(base, startingSnapshotId, rowFilter(), caseSensitive); } } }

Basing validateNewDeletes on whether the conflict detection filter was set doesn't seem correct to me.

If I got you correctly, you are proposing that validateFromSnapshot will now indicate whether we should validate delete files. I think that is different compared to how RowDelta and OverwriteFiles work right now. I'd actually say calling validateFromSnapshot is an optimization that tells us from which snapshot to start looking. We never validate new appends if the append conflict detection filter is null. Moreover, it is not always possible to set the starting snapshot. If we start on an empty table, we must validate all snapshots. Here is our copy-on-write commit logic.

Long scanSnapshotId = scan.snapshotId(); if (scanSnapshotId != null) { overwriteFiles.validateFromSnapshot(scanSnapshotId); } Expression conflictDetectionFilter = conflictDetectionFilter(); overwriteFiles.validateNoConflictingAppends(conflictDetectionFilter);

Also, in your snippet, why call validateNoNewDeletesForDataFiles if we already know the overwrite filter is set? I think validateNoNewDeletesForDataFiles is simply a more efficient version of validateNoNewDeletes that can open delete files that match the filter to check their content for conflicts. The problem is that we can use validateNoNewDeletesForDataFiles only if we overwrite specific files.

rdblue · 2021-09-26T21:31:40Z

core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java

+   * @param dataFilter an expression used to find new conflicting delete files
+   * @param caseSensitive whether expression evaluation should be case-sensitive
+   */
+  protected void validateNoNewDeletes(TableMetadata base, Long startingSnapshotId,


I think a slightly more accurate name would be validateNoNewDeleteFiles since this checks that there aren't any new delete files, but data files could have been deleted.

aokolnychyi · 2021-10-01T18:50:42Z

Closing this as it was split into multiple PRs and merged.

github-actions bot added API core labels Sep 3, 2021

aokolnychyi commented Sep 3, 2021

View reviewed changes

api/src/main/java/org/apache/iceberg/OverwriteFiles.java Outdated Show resolved Hide resolved

aokolnychyi commented Sep 3, 2021

View reviewed changes

api/src/main/java/org/apache/iceberg/OverwriteFiles.java Show resolved Hide resolved

aokolnychyi commented Sep 3, 2021

View reviewed changes

api/src/main/java/org/apache/iceberg/RowDelta.java Outdated Show resolved Hide resolved

aokolnychyi commented Sep 3, 2021

View reviewed changes

api/src/main/java/org/apache/iceberg/RowDelta.java Outdated Show resolved Hide resolved

aokolnychyi commented Sep 3, 2021

View reviewed changes

aokolnychyi force-pushed the validate-new-delete-files branch from 81212ce to d34be6f Compare September 3, 2021 18:50

rdblue added this to the Java 0.12.1 Release milestone Sep 3, 2021

rdblue reviewed Sep 3, 2021

View reviewed changes

api/src/main/java/org/apache/iceberg/OverwriteFiles.java Outdated Show resolved Hide resolved

rdblue reviewed Sep 3, 2021

View reviewed changes

core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java Outdated Show resolved Hide resolved

rdblue reviewed Sep 3, 2021

View reviewed changes

szehon-ho reviewed Sep 5, 2021

View reviewed changes

core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java Outdated Show resolved Hide resolved

aokolnychyi commented Sep 6, 2021

View reviewed changes

openinx reviewed Sep 6, 2021

View reviewed changes

rdblue reviewed Sep 9, 2021

View reviewed changes

core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java Outdated Show resolved Hide resolved

aokolnychyi force-pushed the validate-new-delete-files branch from d34be6f to fca719f Compare September 13, 2021 18:50

aokolnychyi commented Sep 13, 2021

View reviewed changes

aokolnychyi force-pushed the validate-new-delete-files branch from fca719f to c1d0229 Compare September 13, 2021 19:56

szehon-ho reviewed Sep 16, 2021

View reviewed changes

rdblue reviewed Sep 19, 2021

View reviewed changes

core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java Outdated Show resolved Hide resolved

rdblue reviewed Sep 19, 2021

View reviewed changes

core/src/main/java/org/apache/iceberg/BaseRewriteFiles.java Outdated Show resolved Hide resolved

rdblue requested changes Sep 19, 2021

View reviewed changes

rdblue mentioned this pull request Sep 19, 2021

Core: Support serializable and snapshot isolation for ReplacePartitions #2925

Merged

github-actions bot added the spark label Sep 21, 2021

Core: Validate conflicting delete files in RowDelta and OverwriteFiles

078aa70

aokolnychyi force-pushed the validate-new-delete-files branch from 57d99ba to 078aa70 Compare September 21, 2021 01:38

aokolnychyi commented Sep 21, 2021

View reviewed changes

szehon-ho reviewed Sep 21, 2021

View reviewed changes

kbendick reviewed Sep 21, 2021

View reviewed changes

jackye1995 reviewed Sep 24, 2021

View reviewed changes

rdblue reviewed Sep 26, 2021

View reviewed changes

aokolnychyi mentioned this pull request Sep 28, 2021

Core: Validate concurrently added delete files in RowDelta #3195

Merged

jackye1995 mentioned this pull request Sep 30, 2021

Handle the case that RewriteFiles and RowDelta commit the transaction… #3204

Closed

aokolnychyi closed this Oct 1, 2021

rdblue removed this from the Java 0.12.1 Release milestone Oct 26, 2021

		@@ -129,6 +119,7 @@ protected void validate(TableMetadata base) {

		if (conflictDetectionFilter != null && base.currentSnapshot() != null) {
		validateAddedDataFiles(base, startingSnapshotId, conflictDetectionFilter, caseSensitive);

Core: Validate conflicting delete files in RowDelta and OverwriteFiles #3069

Core: Validate conflicting delete files in RowDelta and OverwriteFiles #3069

Conversation

aokolnychyi commented Sep 3, 2021

Choose a reason for hiding this comment

aokolnychyi commented Sep 3, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aokolnychyi Sep 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aokolnychyi Sep 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdblue Sep 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openinx Sep 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackye1995 Sep 24, 2021 • edited Loading

Choose a reason for hiding this comment

szehon-ho left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdblue left a comment

Choose a reason for hiding this comment

aokolnychyi commented Sep 21, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szehon-ho Sep 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackye1995 Sep 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aokolnychyi commented Oct 1, 2021

aokolnychyi Sep 6, 2021 •

edited

Loading

aokolnychyi Sep 6, 2021 •

edited

Loading

rdblue Sep 12, 2021 •

edited

Loading

openinx Sep 6, 2021 •

edited

Loading

jackye1995 Sep 24, 2021 •

edited

Loading

szehon-ho left a comment •

edited

Loading

szehon-ho Sep 21, 2021 •

edited

Loading

jackye1995 Sep 24, 2021 •

edited

Loading