Skip to content

Conversation

@adamreeve
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Simplify API surface and only provide one way to write column chunks and row groups in parallel.

What changes are included in this PR?

  • Make ArrowRowGroupWriterFactory constructor public and simplify it to remove arguments that are available from the SerializedFileWriter.
  • Update ArrowColumnWriter example and test code to use the ArrowRowGroupWriterFactory.
  • Deprecate parquet::arrow::arrow_writer::get_column_writers and parquet::arrow::arrow_writer::ArrowWriter::into_serialized_writer

Are these changes tested?

Yes, covered by existing tests.

Are there any user-facing changes?

Yes, this deprecates existing public methods.

@adamreeve adamreeve requested a review from rok October 10, 2025 02:23
@github-actions github-actions bot added the parquet Changes to the parquet crate label Oct 10, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 -- thank you @adamreeve -- this looks perfect

I have nothing to add

impl ArrowRowGroupWriterFactory {
#[cfg(feature = "encryption")]
fn new<W: Write + Send>(
/// Create a new [`ArrowRowGroupWriterFactory`] for the provided file writer and Arrow schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@alamb alamb merged commit 1eb62bd into apache:main Oct 14, 2025
16 checks passed
@adamreeve adamreeve deleted the unify-parallel-writing branch October 14, 2025 21:37

/// Converts this writer into a lower-level [`SerializedFileWriter`] and [`ArrowRowGroupWriterFactory`].
/// This can be useful to provide more control over how files are written.
#[deprecated(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I suggested deprecating this method 😞 , but after testing downstream with DataFusion I think we should not deprecate this method, for reasons listed on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unify API for writing column chunks / row groups in parallel

3 participants