-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add position in manifest to DataFile and DeleteFile #1723
Conversation
FYI @jerryshao and @HeartSaVioR. Since we can project the position of a record in a file, I added this to metadata so that we can use it in streaming offsets. Please have a look, I think it should avoid needing an extra boolean in the offsets. |
I'm not familiar with the details on the code, but it looks OK in general. This would help us to read a manifest file incrementally across multiple batches without a custom representation. |
Looks like the CI failure is related:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
/** | ||
* Returns the ordinal position of the file in a manifest, or null if it was not read from a manifest. | ||
*/ | ||
Long pos(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: do we want to include anything in the name to indicate that it is a position in the manifest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be open to that, but I couldn't come up with a very good name for this field so I just used "pos" because we use that elsewhere to indicate position in a file.
Will this field also appear in the metadata table output? |
No, it won't. This change only adds pos to files that are read by ManifestReader, and it doesn't change any of the read or write schemas outside of that. |
Thanks for reviewing, @HeartSaVioR and @aokolnychyi! I'll merge this. |
This updates
ManifestReader
to project the file position metadata column so thatDataFile
andDeleteFile
objects read from manifests include the position within the manifest. This is in support of streaming reads, where the data in a table is turned into a stream of snapshots, then new manifests, then data files. The position of each file in a manifest can be encoded as a recoverable offset.