-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec: update to reflect lineage is required #12580
base: main
Are you sure you want to change the base?
Spec: update to reflect lineage is required #12580
Conversation
98c7bd0
to
9f613ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Pending vote and rewording that one line
7e0024a
to
1664e44
Compare
1664e44
to
421d61e
Compare
421d61e
to
6fa2e82
Compare
|
||
Any snapshot without the field `first-row-id` does not have any lineage information and values for `_row_id` and `_last_updated_sequence_number` cannot be assigned accurately. | ||
|
||
All files that were added before `row-lineage` was enabled should propagate null for all of the `row-lineage` related | ||
All files that were added before upgrading to v3 should propagate null for all of the `row-lineage` related |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the should
here in should propogate null
has to be a must
if I'm not mistaken? To not leave any room for implementations to propogate any non-null values for files that didn't have lineage.
@@ -458,11 +457,11 @@ The snapshot then populates the total number of `added-rows` based on the sum of | |||
When the new snapshot is committed, the table's `next-row-id` must also be updated (even if the new snapshot is not in the main branch). Because 225 rows were added (`added1`: 100 + `added2`: 0 + `added3`: 125), the new value is 1,000 + 225 = 1,225: | |||
|
|||
|
|||
##### Enabling Row Lineage for Non-empty Tables | |||
##### Row Lineage for Non-empty, Upgraded Tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I feel like we could just leave it as Row Lineage for Upgraded Tables
.
|
||
1. The row's existing non-null `_row_id` must be copied into the new data file | ||
2. If the write has modified the row, the `_last_updated_sequence_number` field must be set to `null` (so that the modification's sequence number replaces the current value) | ||
3. If the write has not modified the row, the existing non-null `_last_updated_sequence_number` value must be copied to the new data file | ||
|
||
Engines may model operations as deleting rows and inserting rows or as modifications to rows that preserve row ids. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd still need to call out somewhere that "_last_updated_sequence_number" should be preserved if a row is unmodified in the case an operation is not modeled as a delete + insert, and an exisitng row is moved to a different file.
The section above on line 391 used to say that, but I think now it's removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that what line 413 states?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry about that, I was only looking at the changed lines and missed that this already is there. Cool!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a semi-colon or colon before (or after) "or as modifications"
I am not sure because Maggie is at AWP and she usually answers all my grammar questions but I think we need something that differentiates "Deleteing and Insert" from "Modifications to Rows"
Now that equality deletes can co-exist with row lineage, update the spec to require lineage for v3.
This PR includes: