Skip to content

fix(iceberg): add_files correctly check duplicates #1395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Erigara
Copy link
Contributor

@Erigara Erigara commented May 30, 2025

Which issue does this PR close?

What changes are included in this PR?

  • compare duplicates by loading manifest files and taking file_path from it
  • use direct calls instead of scan

Are these changes tested?

  • work for me local experiments
  • fixed existing tests
  • added a new test to showcase behavior

@Erigara Erigara force-pushed the fix/append_files_check_duplicates branch 2 times, most recently from 9813024 to 4dd9e78 Compare May 30, 2025 18:36
@@ -379,6 +364,7 @@ mod tests {

// Attempt to add the existing Parquet files with fast append.
let new_tx = fast_append_action
.with_check_duplicate(false)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what was initial purpose of the test.
But it looks like to check ability of adding existing parquet files to the table (physically existing not in table).
In which case adding with_check_duplicate(false) here is fine.

@Erigara Erigara force-pushed the fix/append_files_check_duplicates branch 2 times, most recently from f287120 to 08f5af5 Compare May 30, 2025 19:04
@Erigara Erigara force-pushed the fix/append_files_check_duplicates branch from 08f5af5 to a90174a Compare May 30, 2025 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug(iceberg): add_files doesn't actually check duplicated files
1 participant