-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Ignore partition fields that are dropped from the current-schema #11868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c25ad2a to
3dff527
Compare
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
842f32b to
83ac170
Compare
83ac170 to
90f5de3
Compare
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
72713a0 to
9e998cc
Compare
...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestAlterTablePartitionFields.java
Show resolved
Hide resolved
...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestAlterTablePartitionFields.java
Outdated
Show resolved
Hide resolved
stevenzwu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| } | ||
|
|
||
| @Test | ||
| public void testPartitionTypeDropInactiveFields() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit, this is testing "ignoring" inactiveFields, they aren't dropped right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are dropped from the struct that's being returned, but ignore is maybe a bit friendlier :)
|
|
||
| public static PartitionSpec fromJson(Schema schema, JsonNode json) { | ||
| return fromJson(json).bind(schema); | ||
| return fromJson(json).bindUnchecked(schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about changing the behavior of this API to bindUnchecked since that skips over validations that still apply when parsing the schema. If I understand the intent,, it's because we're currently failing here
I think I agree that fundamentally when parsing a spec and binding we don't care if a source field exists or not since we'll replace it with unknown later. However, there are some other checks in checkCompatibility that still seem worth running through at this point when the sourceType is not null like making sure that the transform and type does align, the sourcetype isn't something like a struct etc.
It's unfortunately a bit more complicated but should we have a checkCompatibilityIgnoringMissing and a buildIgnoringMissingFields which just does the same checks as checkCompatibility but essentially ignoring if a ifeld is missing, and if it is defined we still do the existing checks? Then we largely keep the same guarantees on this function while solving the issue we set out to solve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @RussellSpitzer @stevenzwu what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunately a bit more complicated but should we have a checkCompatibilityIgnoringMissing and a buildIgnoringMissingFields which just does the same checks as checkCompatibility but essentially ignoring if a ifeld is missing, and if it is defined we still do the existing checks?
this seems like a reasonable approach. We can also just add a boolean arg to the checkCompatibility method. it is a non-public method.
We still need the strict check of source field for the code path of TableMetadata.addPartitionSpecInternal.
TableMetadataParser is the only place using bindUnchecked. I assume it is for the same reason. Maybe the buildChecked can switch to this trimmed down version of checkCompatibility and ignore inactive source fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change to buildUnchecked is required because of a check that was added not long ago: #12887, where someone was previously allowed to create an invalid partition spec.
I've updated the code. Let me know what you think. If we don't want to use buildUnchecked, then we need three variations:
buildStrictwhen a newPartitionSpecis added, we want to make sure that all the fields are therebuildwith allowing missing fields for the inactive specs that contain fields that are dropped.bindUncheckedwhen we don't want to perform checks at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableMetadataParseris the only place usingbindUnchecked. I assume it is for the same reason.
This is correct
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thank you @Fokko I think this is a great fix for a long standing issue that we should get in! Thank you @RussellSpitzer @stevenzwu @HonahX for reviewing!
A second attempt to fix this (the first attempt in #11604 but that one was not safe).
When a field that's dropped that still part of an older partition spec, we currently get an error. See #11842
This will replace it with the
UnknownTypewhich will be read as a null. This will work fine, since when the evaluator encounters anull, it will assume that there is no value:iceberg/api/src/main/java/org/apache/iceberg/expressions/Projections.java
Lines 204 to 207 in dbd7d1c
I misused the
UnknownTypehere, but I think it is better than returning an arbitrary type, like we do in theUnknownTransform:iceberg/api/src/main/java/org/apache/iceberg/transforms/UnknownTransform.java
Lines 63 to 67 in dbd7d1c
Unfortunately, my first approach, where we drop the field, is incorrect, since V1 tables rely on the order.
Fixes #4563
Fixes #12738