Move field into place when adding during schema evolution #8409

hiloboy0119 · 2023-08-28T04:22:24Z

While trying to use schema evolution on write I noticed new fields were always being added to the end of the struct/table. This caused schema evolution to fail. I added tests to verify and fixed by moving the field if necessary

hiloboy0119 · 2023-08-29T15:12:50Z

I found a few more cases where this fails. Going to work on updating today. The before/after lookup needs to be a bit more sophisticated

amogh-jahagirdar

@hiloboy0119 thanks for the PR, do you mind clarifying what was the exact way you were trying to evolve the table schema via Spark ACCEPT_ANY_SCHEMA, and the specific failure? That would help a lot in terms of reviewing the PR

hiloboy0119 · 2024-03-07T00:14:15Z

@amogh-jahagirdar sorry for the long delay. Finally had time to write tests for the various edge cases and fix a few things.

The examples of schema evolution are all shown in test cases. I added fields to the beginning and middle of tables. Without these changes they were always added to the end of the schema.

…dering

hiloboy0119 · 2024-03-07T01:48:15Z

core/src/main/java/org/apache/iceberg/schema/UnionByNameVisitor.java

@@ -90,6 +94,15 @@ public Boolean struct(
              Types.NestedField field = fields.get(pos);
              if (isMissing) {
                addColumn(partnerId, field);


@amogh-jahagirdar This is the core problem with the existing code. Additional columns, no matter where they are in the schema are just added. This always adds them to the end

github-actions · 2024-09-14T00:14:01Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

github-actions · 2024-09-21T00:14:54Z

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

github-actions bot added spark core labels Aug 28, 2023

amogh-jahagirdar reviewed Sep 1, 2023

View reviewed changes

hiloboy0119 force-pushed the fix/add_column_not_at_end branch from a5685a6 to 2f75c87 Compare March 7, 2024 00:09

Move field into place when adding during schema evolution

5e256cc

hiloboy0119 force-pushed the fix/add_column_not_at_end branch from 2f75c87 to 5e256cc Compare March 7, 2024 00:11

Drew Goya added 6 commits March 6, 2024 16:25

Removing some unecessary code and an old comment

7547f05

Spotless styling

d380e0a

Make calls directly against the SchemaUpdate API

031af7c

Fixing problem with nested fields

9dc8b19

Fixing a problem with moving a nested field to the correct location

8f2edbd

Adding test for deeply nested fields (2 layers down) with field re-or…

1908c79

…dering

hiloboy0119 commented Mar 7, 2024

View reviewed changes

github-actions bot added the stale label Sep 14, 2024

github-actions bot closed this Sep 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move field into place when adding during schema evolution #8409

Move field into place when adding during schema evolution #8409

hiloboy0119 commented Aug 28, 2023

hiloboy0119 commented Aug 29, 2023

amogh-jahagirdar left a comment

hiloboy0119 commented Mar 7, 2024

hiloboy0119 Mar 7, 2024

github-actions bot commented Sep 14, 2024

github-actions bot commented Sep 21, 2024

Move field into place when adding during schema evolution #8409

Move field into place when adding during schema evolution #8409

Conversation

hiloboy0119 commented Aug 28, 2023

hiloboy0119 commented Aug 29, 2023

amogh-jahagirdar left a comment

Choose a reason for hiding this comment

hiloboy0119 commented Mar 7, 2024

hiloboy0119 Mar 7, 2024

Choose a reason for hiding this comment

github-actions bot commented Sep 14, 2024

github-actions bot commented Sep 21, 2024