Skip to content

Commit 5a384f4

Browse files
authored
Undeprecate ArrowWriter::into_serialized_writer and add docs (#8621)
# Which issue does this PR close? We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. - Related to #7835 # Rationale for this change While testing the arrow 57 upgrade in DataFusion I found a few things that need to be fixed in parquet-rs. - apache/datafusion#17888 One was that the method `ArrowWriter::into_serialized_writer` was deprecated, (which I know I suggested in #8389 🤦 ). However, when testing it turns out that the constructor of `SerializedFileWriter` does a lot of work (like creating the parquet schema from the arrow schema and messing with metadata) https://github.com/apache/arrow-rs/blob/c4f0fc12199df696620c73d62523c8eef5743bf2/parquet/src/arrow/arrow_writer/mod.rs#L230-L263 Creating a `RowGroupWriterFactory` directly would involve a bunch of code duplication # What changes are included in this PR? So let's not deprecate this method for now and instead add some additional docs to guide people to the right lace # Are these changes tested? I tested manually upstream # Are there any user-facing changes? If there are user-facing changes then we may require documentation to be updated before approving the PR. If there are any breaking changes to public APIs, please call them out.
1 parent f3baa80 commit 5a384f4

File tree

1 file changed

+16
-6
lines changed
  • parquet/src/arrow/arrow_writer

1 file changed

+16
-6
lines changed

parquet/src/arrow/arrow_writer/mod.rs

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -450,11 +450,11 @@ impl<W: Write + Send> ArrowWriter<W> {
450450
}
451451

452452
/// Converts this writer into a lower-level [`SerializedFileWriter`] and [`ArrowRowGroupWriterFactory`].
453-
/// This can be useful to provide more control over how files are written.
454-
#[deprecated(
455-
since = "57.0.0",
456-
note = "Construct a `SerializedFileWriter` and `ArrowRowGroupWriterFactory` directly instead"
457-
)]
453+
///
454+
/// Flushes any outstanding data before returning.
455+
///
456+
/// This can be useful to provide more control over how files are written, for example
457+
/// to write columns in parallel. See the example on [`ArrowColumnWriter`].
458458
pub fn into_serialized_writer(
459459
mut self,
460460
) -> Result<(SerializedFileWriter<W>, ArrowRowGroupWriterFactory)> {
@@ -872,6 +872,12 @@ impl ArrowColumnWriter {
872872
}
873873

874874
/// Encodes [`RecordBatch`] to a parquet row group
875+
///
876+
/// Note: this structure is created by [`ArrowRowGroupWriterFactory`] internally used to
877+
/// create [`ArrowRowGroupWriter`]s, but it is not exposed publicly.
878+
///
879+
/// See the example on [`ArrowColumnWriter`] for how to encode columns in parallel
880+
#[derive(Debug)]
875881
struct ArrowRowGroupWriter {
876882
writers: Vec<ArrowColumnWriter>,
877883
schema: SchemaRef,
@@ -907,6 +913,10 @@ impl ArrowRowGroupWriter {
907913
}
908914

909915
/// Factory that creates new column writers for each row group in the Parquet file.
916+
///
917+
/// You can create this structure via an [`ArrowWriter::into_serialized_writer`].
918+
/// See the example on [`ArrowColumnWriter`] for how to encode columns in parallel
919+
#[derive(Debug)]
910920
pub struct ArrowRowGroupWriterFactory {
911921
schema: SchemaDescPtr,
912922
arrow_schema: SchemaRef,
@@ -937,7 +947,7 @@ impl ArrowRowGroupWriterFactory {
937947
Ok(ArrowRowGroupWriter::new(writers, &self.arrow_schema))
938948
}
939949

940-
/// Create column writers for a new row group.
950+
/// Create column writers for a new row group, with the given row group index
941951
pub fn create_column_writers(&self, row_group_index: usize) -> Result<Vec<ArrowColumnWriter>> {
942952
let mut writers = Vec::with_capacity(self.arrow_schema.fields.len());
943953
let mut leaves = self.schema.columns().iter();

0 commit comments

Comments
 (0)