Skip to content

[SPARK-55713][PYTHON][TESTS] Add benchmark for long type conversions#54513

Open
zhengruifeng wants to merge 2 commits intoapache:masterfrom
zhengruifeng:update_benchmark_null_int
Open

[SPARK-55713][PYTHON][TESTS] Add benchmark for long type conversions#54513
zhengruifeng wants to merge 2 commits intoapache:masterfrom
zhengruifeng:update_benchmark_null_int

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

Add benchmark for long type conversions

Why are the changes needed?

to check the performance of critical code path

Does this PR introduce any user-facing change?

no, test-only

How was this patch tested?

manually check for now, the ASV is not yet set up in CI

(spark-dev-313) ➜  benchmarks git:(update_benchmark_null_int) asv run --python=same --quick -b 'bench_arrow.LongArrowToPandasBenchmark'
· Discovering benchmarks
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[ 0.00%] ·· Benchmarking existing-py_Users_ruifeng.zheng_.dev_miniconda3_envs_spark-dev-313_bin_python3.13
[25.00%] ··· bench_arrow.LongArrowToPandasBenchmark.peakmem_long_to_pandas                                                 ok
[25.00%] ··· ========= ======== ==================== ===========
             --                          method                 
             --------- -----------------------------------------
               n_rows   simple   arrow_types_mapper   pd.Series 
             ========= ======== ==================== ===========
               10000     109M           110M             111M   
               100000    117M           115M             110M   
              1000000    164M           165M             165M   
             ========= ======== ==================== ===========

[50.00%] ··· bench_arrow.LongArrowToPandasBenchmark.time_long_to_pandas                                                    ok
[50.00%] ··· ========= ========= ==================== ===========
             --                          method                  
             --------- ------------------------------------------
               n_rows    simple   arrow_types_mapper   pd.Series 
             ========= ========= ==================== ===========
               10000    131±0μs        310±0μs          162±0μs  
               100000   134±0μs        482±0μs          173±0μs  
              1000000   155±0μs        1.35±0ms         273±0μs  
             ========= ========= ==================== ===========

(spark-dev-313) ➜  benchmarks git:(update_benchmark_null_int) asv run --python=same --quick -b 'bench_arrow.NullableLongArrowToPandasBenchmark'
· Discovering benchmarks
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[ 0.00%] ·· Benchmarking existing-py_Users_ruifeng.zheng_.dev_miniconda3_envs_spark-dev-313_bin_python3.13
[25.00%] ··· bench_arrow.NullableLongArrowToPandasBenchmark.peakmem_long_with_nulls_to_pandas_ext                          ok
[25.00%] ··· ========= ====================== ==================== ===========
             --                                 method                        
             --------- -------------------------------------------------------
               n_rows   integer_object_nulls   arrow_types_mapper   pd.Series 
             ========= ====================== ==================== ===========
               10000            110M                  110M             108M   
               100000           132M                  115M             113M   
              1000000           246M                  201M             201M   
             ========= ====================== ==================== ===========

[50.00%] ··· bench_arrow.NullableLongArrowToPandasBenchmark.time_long_with_nulls_to_pandas_ext                             ok
[50.00%] ··· ========= ====================== ==================== ===========
             --                                 method                        
             --------- -------------------------------------------------------
               n_rows   integer_object_nulls   arrow_types_mapper   pd.Series 
             ========= ====================== ==================== ===========
               10000          1.49±0ms              1.81±0ms         3.19±0ms 
               100000         13.2±0ms              12.2±0ms         30.0±0ms 
              1000000         158±0ms               123±0ms          296±0ms  
             ========= ====================== ==================== ===========

Was this patch authored or co-authored using generative AI tooling?

no

@zhengruifeng
Copy link
Contributor Author

also cc @fangchenli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant