chore(deps): update parquet/arrow/arrow-csv from 56 to 57 #200

kevinjqliu · 2025-11-01T22:38:45Z

Closes #197 #198 #199

kevinjqliu · 2025-11-01T23:27:41Z

looks like we need to refactor get_column_writers since its now deprecated

clflushopt · 2025-11-02T03:38:09Z

I am wondering what the impact is of moving to 57 for all three deps

kevinjqliu · 2025-11-03T01:16:20Z

I am wondering what the impact is of moving to 57 for all three deps

in terms of performance?

alamb

Thanks @kevinjqliu and @clflushopt

I ran some unscientific benchmarks on my laptop and indeed it seems like the upgrade makes writing 10% slower for some reason 🤔

Benchmark

rm -rf lineitem && time tpchgen-cli --scale-factor=100 --tables=lineitem --parts=10 --format=parquet

Release	Time
main	0m25.369s
main	0m25.729s
this PR	0m28.516s
this PR	0m28.682s

I'll do some profiling and see if I can see any reason

alamb · 2025-11-03T21:31:01Z

tpchgen-cli/src/parquet.rs

    // Create writers for each of the leaf columns
-    let mut col_writers = get_column_writers(&parquet_schema, &writer_properties, &schema).unwrap();
+    #[allow(deprecated)]
+    let mut col_writers = parquet::arrow::arrow_writer::get_column_writers(


we just need to change this to follow the example here: https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowColumnWriter.html

kevinjqliu · 2025-11-04T00:45:56Z

Not scientific either, this shows improvements for this PR vs last release (2.0.1)
only changes since 2.0.1 release are related to github actions, v2.0.1...main

➜ hyperfine --warmup 5 --runs 30 './target/release/tpchgen-cli -s 10'
Benchmark 1: ./target/release/tpchgen-cli -s 10
  Time (mean ± σ):     729.7 ms ±   7.6 ms    [User: 702.9 ms, System: 20.1 ms]
  Range (min … max):   714.5 ms … 748.4 ms    30 runs
 
➜ hyperfine --warmup 5 --runs 30 'uvx tpchgen-cli -s 10'
Benchmark 1: uvx tpchgen-cli -s 10
  Time (mean ± σ):     825.4 ms ±  12.3 ms    [User: 706.8 ms, System: 29.4 ms]
  Range (min … max):   812.7 ms … 873.6 ms    30 runs

alamb · 2025-11-04T15:49:03Z

Not scientific either, this shows improvements for this PR vs last release (2.0.1) only changes since 2.0.1 release are related to github actions, v2.0.1...main

➜ hyperfine --warmup 5 --runs 30 './target/release/tpchgen-cli -s 10'
Benchmark 1: ./target/release/tpchgen-cli -s 10
  Time (mean ± σ):     729.7 ms ±   7.6 ms    [User: 702.9 ms, System: 20.1 ms]
  Range (min … max):   714.5 ms … 748.4 ms    30 runs
 
➜ hyperfine --warmup 5 --runs 30 'uvx tpchgen-cli -s 10'
Benchmark 1: uvx tpchgen-cli -s 10
  Time (mean ± σ):     825.4 ms ±  12.3 ms    [User: 706.8 ms, System: 29.4 ms]
  Range (min … max):   812.7 ms … 873.6 ms    30 runs

I think by default tpchgen-cli makes TBL files (not parquet) so this command is likely not testing any changes related to arrow/parquet

alamb · 2025-11-04T16:52:14Z

I filed a ticket upstream to investigate and will post my findings there

[Parquet] Writing in 57.0.0 seems 10% slower than 56.0.0 apache/arrow-rs#8783

kevinjqliu · 2025-11-04T18:29:59Z

I think by default tpchgen-cli makes TBL files (not parquet) so this command is likely not testing any changes related to arrow/parquet

oh yea, good point. I did it agains with the same options you used above. This PR is faster than v2.0.1

➜ hyperfine --warmup 5 --runs 30 './target/release/tpchgen-cli --scale-factor=100 --tables=lineitem --parts=10 --format=parquet'
Benchmark 1: ./target/release/tpchgen-cli --scale-factor=100 --tables=lineitem --parts=10 --format=parquet
  Time (mean ± σ):     784.5 ms ±   3.4 ms    [User: 757.3 ms, System: 20.9 ms]
  Range (min … max):   777.0 ms … 791.4 ms    30 runs

➜ hyperfine --warmup 5 --runs 30 'uvx tpchgen-cli --scale-factor=100 --tables=lineitem --parts=10 --format=parquet'
Benchmark 1: uvx tpchgen-cli --scale-factor=100 --tables=lineitem --parts=10 --format=parquet
  Time (mean ± σ):     869.3 ms ±   5.3 ms    [User: 753.4 ms, System: 29.8 ms]
  Range (min … max):   860.0 ms … 882.8 ms    30 runs

alamb · 2025-11-04T20:29:02Z

I think we have figured out what was going on

[Parquet] Writing in 57.0.0 seems 10% slower than 56.0.0 apache/arrow-rs#8783

clflushopt · 2025-11-04T22:47:19Z

I wanted to confirm similar numbers, I spent a little while thinking I was doing something wrong 😮‍💨

alamb · 2025-11-05T14:44:09Z

I wanted to confirm similar numbers, I spent a little while thinking I was doing something wrong 😮‍💨

The good (amazing ❤️ ) news is that @etseidl has a fix already (will be released in 57.1.0):

perf: Speed up Parquet file writing (10%, back to speed of 56) apache/arrow-rs#8786

I verified with the code in apache/arrow-rs#8786, this PR now has roughly similar performance to 56

It still seems a few percent slower (0m25.957s vs 0m25.378s) but I will file some issues upstream to further optimize things

parquet/arrow/arrow-csv, 56 -> 57

ab9b8c0

kevinjqliu added 2 commits November 1, 2025 17:45

allow deprecated for now

2a7f307

fmt

c38f69b

alamb approved these changes Nov 3, 2025

View reviewed changes

alamb mentioned this pull request Nov 4, 2025

[Parquet] Writing in 57.0.0 seems 10% slower than 56.0.0 apache/arrow-rs#8783

Closed

alamb mentioned this pull request Nov 5, 2025

perf: Speed up Parquet file writing (10%, back to speed of 56) apache/arrow-rs#8786

Merged

chore(deps): update parquet/arrow/arrow-csv from 56 to 57 #200

Are you sure you want to change the base?

chore(deps): update parquet/arrow/arrow-csv from 56 to 57 #200

Uh oh!

Conversation

kevinjqliu commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kevinjqliu commented Nov 1, 2025

Uh oh!

clflushopt commented Nov 2, 2025

Uh oh!

kevinjqliu commented Nov 3, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

kevinjqliu commented Nov 4, 2025

Uh oh!

alamb commented Nov 4, 2025

Uh oh!

alamb commented Nov 4, 2025

Uh oh!

kevinjqliu commented Nov 4, 2025

Uh oh!

alamb commented Nov 4, 2025

Uh oh!

clflushopt commented Nov 4, 2025

Uh oh!

alamb commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kevinjqliu commented Nov 1, 2025 •

edited

Loading