-
Notifications
You must be signed in to change notification settings - Fork 1k
Unify API for writing column chunks / row groups in parallel #8582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 -- thank you @adamreeve -- this looks perfect
I have nothing to add
| impl ArrowRowGroupWriterFactory { | ||
| #[cfg(feature = "encryption")] | ||
| fn new<W: Write + Send>( | ||
| /// Create a new [`ArrowRowGroupWriterFactory`] for the provided file writer and Arrow schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
etseidl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
|
||
| /// Converts this writer into a lower-level [`SerializedFileWriter`] and [`ArrowRowGroupWriterFactory`]. | ||
| /// This can be useful to provide more control over how files are written. | ||
| #[deprecated( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I suggested deprecating this method 😞 , but after testing downstream with DataFusion I think we should not deprecate this method, for reasons listed on
Which issue does this PR close?
Rationale for this change
Simplify API surface and only provide one way to write column chunks and row groups in parallel.
What changes are included in this PR?
ArrowRowGroupWriterFactoryconstructor public and simplify it to remove arguments that are available from theSerializedFileWriter.ArrowColumnWriterexample and test code to use theArrowRowGroupWriterFactory.parquet::arrow::arrow_writer::get_column_writersandparquet::arrow::arrow_writer::ArrowWriter::into_serialized_writerAre these changes tested?
Yes, covered by existing tests.
Are there any user-facing changes?
Yes, this deprecates existing public methods.