-
Notifications
You must be signed in to change notification settings - Fork 1k
[thrift-remodel] Add thrift write support #8237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e10e274
to
c729d22
Compare
I am hoping to review this later today or tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked through this and I think it looks really nice 😍
One thing that might be a useful exercise is to simulate someone making a new change to the thrift parquet spec and seeing how easy/hard it is to add support for that new feature with the custom thrift parser.
THe real test will be to see if someone other than @etseidl can accomplish it :)
parquet/src/basic.rs
Outdated
} | ||
} | ||
|
||
// FIXME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove. This is one of those situations where I have to hand-do the serialization code. I could instead use a private intermediate struct and then translate...it's just an enum after all.
another example is LogicalType
. The current implementation uses struct variants, but the thrift macros use tuple variants. I could remove a lot of code if we just change that, but there would be ripples.
} | ||
} | ||
|
||
///////////////////////// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is pretty nice that this is so straightforward
let column_name = col_meta.column_descr().name().to_string(); | ||
let page_locations = offset_index[rg_index][col_idx] | ||
.page_locations().to_vec(); | ||
let page_locations = offset_index[rg_index][col_idx].page_locations(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
|
||
/// macro to use when decoding struct fields | ||
#[doc(hidden)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be great to get some more doc strings on these macros
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhorstmann added some to his macros that I will steal and expand upon. Right now I've got six more branches to go that all depend on the preceding one, so I don't want to make too many changes here...I'll instead start a list of all the places I need more documentation and make those changes once the full end-to-end is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to me
} | ||
} | ||
|
||
pub(crate) trait WriteThrift { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some more docstrings here explaining what this was for and what it did would be super helpful I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there are many more changes to this particular API so I'll take a stab at documentation here.
Thanks for the review @alamb! I'll start on the documentation for the write API, and then leave some TODOs based on your comments. Then on to the next PR 😄 |
It is pretty exciting to see this coming together |
🚀 |
Which issue does this PR close?
Note: this targets a feature branch, not main
Rationale for this change
Begins adding custom thrift write support.
What changes are included in this PR?
Adds traits to aid in writing of thrift and modifies thrift macros to generate writing code.
Are these changes tested?
Yes, adds some roundtrip tests to validate encoded data can be decoded to the same state.
Are there any user-facing changes?
No