Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec: update to reflect lineage is required #12580

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

danielcweeks
Copy link
Contributor

Now that equality deletes can co-exist with row lineage, update the spec to require lineage for v3.

This PR includes:

  • removal of the flag to enable row lineage
  • adjust language to remove reference of lineage being optional in v3
  • clarify spec based language around what should/must be handled by writers

@github-actions github-actions bot added the Specification Issues that may introduce spec changes. label Mar 19, 2025
@danielcweeks danielcweeks force-pushed the spec/v3-row-lineage-required branch 2 times, most recently from 98c7bd0 to 9f613ef Compare March 24, 2025 16:59
Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Pending vote and rewording that one line

@danielcweeks danielcweeks force-pushed the spec/v3-row-lineage-required branch 3 times, most recently from 7e0024a to 1664e44 Compare March 25, 2025 22:04
@danielcweeks danielcweeks force-pushed the spec/v3-row-lineage-required branch from 1664e44 to 421d61e Compare March 25, 2025 22:08
@danielcweeks danielcweeks force-pushed the spec/v3-row-lineage-required branch from 421d61e to 6fa2e82 Compare March 25, 2025 22:09

Any snapshot without the field `first-row-id` does not have any lineage information and values for `_row_id` and `_last_updated_sequence_number` cannot be assigned accurately.

All files that were added before `row-lineage` was enabled should propagate null for all of the `row-lineage` related
All files that were added before upgrading to v3 should propagate null for all of the `row-lineage` related
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the should here in should propogate null has to be a must if I'm not mistaken? To not leave any room for implementations to propogate any non-null values for files that didn't have lineage.

@@ -458,11 +457,11 @@ The snapshot then populates the total number of `added-rows` based on the sum of
When the new snapshot is committed, the table's `next-row-id` must also be updated (even if the new snapshot is not in the main branch). Because 225 rows were added (`added1`: 100 + `added2`: 0 + `added3`: 125), the new value is 1,000 + 225 = 1,225:


##### Enabling Row Lineage for Non-empty Tables
##### Row Lineage for Non-empty, Upgraded Tables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I feel like we could just leave it as Row Lineage for Upgraded Tables.


1. The row's existing non-null `_row_id` must be copied into the new data file
2. If the write has modified the row, the `_last_updated_sequence_number` field must be set to `null` (so that the modification's sequence number replaces the current value)
3. If the write has not modified the row, the existing non-null `_last_updated_sequence_number` value must be copied to the new data file

Engines may model operations as deleting rows and inserting rows or as modifications to rows that preserve row ids.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd still need to call out somewhere that "_last_updated_sequence_number" should be preserved if a row is unmodified in the case an operation is not modeled as a delete + insert, and an exisitng row is moved to a different file.

The section above on line 391 used to say that, but I think now it's removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that what line 413 states?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry about that, I was only looking at the changed lines and missed that this already is there. Cool!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a semi-colon or colon before (or after) "or as modifications"

I am not sure because Maggie is at AWP and she usually answers all my grammar questions but I think we need something that differentiates "Deleteing and Insert" from "Modifications to Rows"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Specification Issues that may introduce spec changes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants