Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move field into place when adding during schema evolution #8409

Closed

Conversation

hiloboy0119
Copy link

While trying to use schema evolution on write I noticed new fields were always being added to the end of the struct/table. This caused schema evolution to fail. I added tests to verify and fixed by moving the field if necessary

@hiloboy0119
Copy link
Author

I found a few more cases where this fails. Going to work on updating today. The before/after lookup needs to be a bit more sophisticated

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hiloboy0119 thanks for the PR, do you mind clarifying what was the exact way you were trying to evolve the table schema via Spark ACCEPT_ANY_SCHEMA, and the specific failure? That would help a lot in terms of reviewing the PR

@hiloboy0119 hiloboy0119 force-pushed the fix/add_column_not_at_end branch from a5685a6 to 2f75c87 Compare March 7, 2024 00:09
@hiloboy0119 hiloboy0119 force-pushed the fix/add_column_not_at_end branch from 2f75c87 to 5e256cc Compare March 7, 2024 00:11
@hiloboy0119
Copy link
Author

@amogh-jahagirdar sorry for the long delay. Finally had time to write tests for the various edge cases and fix a few things.

The examples of schema evolution are all shown in test cases. I added fields to the beginning and middle of tables. Without these changes they were always added to the end of the schema.

@@ -90,6 +94,15 @@ public Boolean struct(
Types.NestedField field = fields.get(pos);
if (isMissing) {
addColumn(partnerId, field);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amogh-jahagirdar This is the core problem with the existing code. Additional columns, no matter where they are in the schema are just added. This always adds them to the end

Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Sep 14, 2024
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants