You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enable parallel writing across row groups when writing encrypted parquet (#8162)
- Closes#8115.
- Closes#8260
- Closes#8259
# Rationale for this change
#8029 introduced `pub
ArrowWriter.get_column_writers` and `pub ArrowWriter.append_row_group`
to enable multi-threaded parquet encrypted writing. However testing
downstream showed the API is not feasible, see #8115.
# What changes are included in this PR?
This introduces `pub ArrowWriter.into_serialized_writer` and deprecates
`pub ArrowWriter.get_column_writers` and `pub
ArrowWriter.append_row_group`. It also makes
`ArrowRowGroupWriterFactory` public and adds a `pub
ArrowRowGroupWriterFactory.create_column_writers`.
# Are these changes tested?
This includes a DataFusion inspired test for concurrent writing across
columns and row groups to make sure parallel writing is and remains
possible with `ArrowWriter`s API. Further we created a draft PR in
DataFusion apache/datafusion#16738 to test for
multithreaded writing support.
# Are there any user-facing changes?
See description of changes.
---------
Co-authored-by: Adam Reeve <adreeve@gmail.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
0 commit comments