-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parquet: Add ArrowWriterOptions to skip embedding the arrow metadata #5299
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me thank you, I do think we should probably move WriterProperties on to ArrowWriterOptions though, as forcing users to juggle two options collections seems cumbersome. Apologies for the slow review, been a bit swamped lately
pub fn try_new_with_options( | ||
writer: W, | ||
arrow_schema: SchemaRef, | ||
props: Option<WriterProperties>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could move this on to ArrowWriterOptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll try it. The cpp FileWriter API separates these two parameters so I kept try_new_with_options()
similar to theirs.
::arrow::Result<std::unique_ptr<FileWriter>> Open(
const ::arrow::Schema &schema,
MemoryPool *pool,
std::shared_ptr<::arrow::io::OutputStream> sink,
std::shared_ptr<WriterProperties> properties = default_writer_properties(),
std::shared_ptr<ArrowWriterProperties> arrow_properties = default_arrow_writer_properties()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think we should probably move WriterProperties on to ArrowWriterOptions
This approach looks good to me. And we can also consider deprecating the old try_new
method or making a breaking change to remove it, since these two are very similar
Thank you for this |
Which issue does this PR close?
Closes #5296.
Rationale for this change
What changes are included in this PR?
Adds the following APIs
ArrowWriter::try_new_with_options()
AsyncArrowWriter::try_new_with_options()
ArrowWriterOptions
Are there any user-facing changes?
Users can
ArrowWriterOptions::with_skip_arrow_metadata()
to setskip_arrow_metadata
totrue
and skip embedding the arrow schema.ArrowWriterOptions::with_properties()
to configure the parquet writer.