feat: Include statistics for Reserved Fields#1849
Conversation
kevinjqliu
left a comment
There was a problem hiding this comment.
LGTM! This aligns with both pyiceberg and spark behavior
|
manually retriggering ci runs, due to #1838 😞 |
They don't show up in the table schema, but can be important for query optimization
13f25cd to
6e32474
Compare
kevinjqliu
left a comment
There was a problem hiding this comment.
LGTM!!
should we update the PR title/description to reflect the new changes?
|
|
||
| /// Reserved field ID for the spec ID (_spec_id) column per Iceberg spec | ||
| pub const RESERVED_FIELD_ID_SPEC_ID: i32 = i32::MAX - 4; | ||
|
|
There was a problem hiding this comment.
nit: add _partition here for completeness?
|
|
||
| /// Reserved field ID for the position in position delete files | ||
| pub const RESERVED_FIELD_ID_DELETE_FILE_POS: i32 = i32::MAX - 102; | ||
|
|
There was a problem hiding this comment.
nit: add row for completeness?
There was a problem hiding this comment.
I left out row on purpose, since it is a struct that corresponds with the table schema. row can be used to decide if a positional delete is relevant for your query (since you collect statistics for the rows that are dropped), but I don't think any engine leverages that today. There was even a thread on the dev-list to deprecate this functionality.
Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
kevinjqliu
left a comment
There was a problem hiding this comment.
LGTM!
looks like the ci is stuck, i retriggered it
liurenjie1024
left a comment
There was a problem hiding this comment.
Thanks @Fokko , LGTM!
|
Thanks @kevinjqliu and @liurenjie1024 for checking 🙌 |
This is a behavioral change.
In Iceberg-Rust we require upper/lower bounds to be part of the schema. But in some cases, this isn't the case, for example when you use reserved fields. In PyIceberg we expect these values in some tests:
This is a positional delete where the field-IDs are constant, but never part of a schema (they are reserved).
Which issue does this PR close?
What changes are included in this PR?
Are these changes tested?