Skip to content

Improve performance of reading int8/int16 Parquet data #7055

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 9, 2025

Conversation

etseidl
Copy link
Contributor

@etseidl etseidl commented Jan 31, 2025

Which issue does this PR close?

Rationale for this change

While investigating #7040 it was found that reading Parquet files with int8/int16 columns was slower than expected. Avoiding the use of arrow_cast::cast for these columns and instead directly casting using PrimitiveArray::unary is much faster.

What changes are included in this PR?

Modifies PrimitiveArrayReader to explicitly handle conversion of Parquet physical type INT32 to Arrow (u)int8/(u)int16.

This PR also includes additions to the arrow_reader benchmark.

Are there any user-facing changes?

No API changes, but there is a change in behavior. Before, improperly encoded columns would return nulls upon being read, whereas now the columns will be read and truncated to the proper bitwidth. For example, 238u8 might be encoded as 0xffffffee rather than 0x000000ee. arrow_cast::cast will return None for this conversion, this PR will instead return 238u8.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jan 31, 2025
@etseidl etseidl changed the title allow for reading improperly encode UINT_8 and UINT_16 parquet data Allow for reading improperly encoded UINT_8 and UINT_16 Parquet data Jan 31, 2025
@etseidl
Copy link
Contributor Author

etseidl commented Apr 10, 2025

Benchmarks for integer conversion.

int8 details
group                                                                             55_0                                   cast_int
-----                                                                             ----                                   --------
arrow_array_reader/Int8Array/binary packed skip, mandatory, no NULLs              1.13     47.2±0.59µs        ? ?/sec    1.00     41.6±0.60µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, half NULLs             1.44     78.1±1.06µs        ? ?/sec    1.00     54.1±0.72µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, no NULLs               1.11     48.5±0.64µs        ? ?/sec    1.00     43.8±0.61µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, mandatory, no NULLs                   1.18     66.2±0.75µs        ? ?/sec    1.00     56.3±0.84µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, half NULLs                  1.51    138.1±1.33µs        ? ?/sec    1.00     91.5±1.38µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, no NULLs                    1.15     69.7±0.87µs        ? ?/sec    1.00     60.7±1.14µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, mandatory, no NULLs       1.42     29.8±0.42µs        ? ?/sec    1.00     20.9±0.24µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, half NULLs      1.61    118.0±1.28µs        ? ?/sec    1.00     73.4±1.38µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, no NULLs        1.35     32.6±0.38µs        ? ?/sec    1.00     24.1±0.29µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, mandatory, no NULLs              1.19     59.1±0.72µs        ? ?/sec    1.00     49.8±0.48µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, half NULLs             1.50    135.4±1.44µs        ? ?/sec    1.00     90.4±1.41µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, no NULLs               1.17     62.4±0.62µs        ? ?/sec    1.00     53.2±0.92µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, mandatory, no NULLs                   1.49     28.6±0.35µs        ? ?/sec    1.00     19.2±0.29µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, half NULLs                  1.65    117.7±1.18µs        ? ?/sec    1.00     71.3±0.92µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, no NULLs                    1.43     32.3±0.86µs        ? ?/sec    1.00     22.6±0.35µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, mandatory, no NULLs             1.10     49.4±0.55µs        ? ?/sec    1.00     45.0±0.85µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, half NULLs            1.39     78.2±0.80µs        ? ?/sec    1.00     56.1±0.61µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, no NULLs              1.06     51.6±0.66µs        ? ?/sec    1.00     48.6±6.94µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, mandatory, no NULLs                  1.15     70.2±0.81µs        ? ?/sec    1.00     61.2±0.85µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, half NULLs                 1.48    138.2±2.01µs        ? ?/sec    1.00     93.1±1.14µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, no NULLs                   1.14     73.5±1.20µs        ? ?/sec    1.00     64.7±1.06µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, mandatory, no NULLs      1.52     30.7±0.36µs        ? ?/sec    1.00     20.2±0.25µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, half NULLs     1.61    116.4±1.57µs        ? ?/sec    1.00     72.3±1.13µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, no NULLs       1.35     33.6±0.80µs        ? ?/sec    1.00     24.8±0.35µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, mandatory, no NULLs             1.14     59.1±0.53µs        ? ?/sec    1.00     52.0±7.15µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, half NULLs            1.49    134.2±2.16µs        ? ?/sec    1.00     89.9±1.02µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, no NULLs              1.17     62.4±0.79µs        ? ?/sec    1.00     53.2±0.78µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, mandatory, no NULLs                  1.51     28.7±0.48µs        ? ?/sec    1.00     19.0±0.24µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, half NULLs                 1.66    117.0±2.70µs        ? ?/sec    1.00     70.5±0.78µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, no NULLs                   1.43     31.9±0.58µs        ? ?/sec    1.00     22.3±0.35µs        ? ?/sec
int16 details
group                                                                             55_0                                   cast_int
-----                                                                             ----                                   --------
arrow_array_reader/Int16Array/binary packed skip, mandatory, no NULLs             1.14     47.2±0.54µs        ? ?/sec    1.00     41.5±0.78µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, half NULLs            1.44     77.6±1.00µs        ? ?/sec    1.00     54.0±0.77µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, no NULLs              1.13     49.0±0.50µs        ? ?/sec    1.00     43.3±0.59µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, mandatory, no NULLs                  1.20     67.5±0.79µs        ? ?/sec    1.00     56.0±0.59µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, half NULLs                 1.51    136.1±2.29µs        ? ?/sec    1.00     90.1±1.33µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, no NULLs                   1.17     70.7±0.92µs        ? ?/sec    1.00     60.6±0.92µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, mandatory, no NULLs      1.53     30.7±0.54µs        ? ?/sec    1.00     20.1±0.24µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, half NULLs     1.65    116.6±2.29µs        ? ?/sec    1.00     70.9±0.88µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, no NULLs       1.44     33.5±0.41µs        ? ?/sec    1.00     23.2±0.25µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, mandatory, no NULLs             1.25     61.0±1.06µs        ? ?/sec    1.00     48.7±0.93µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, half NULLs            1.52    134.2±2.09µs        ? ?/sec    1.00     88.5±1.44µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, no NULLs              1.23     64.3±1.10µs        ? ?/sec    1.00     52.1±0.64µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, mandatory, no NULLs                  1.62     30.2±0.61µs        ? ?/sec    1.00     18.7±0.23µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, half NULLs                 1.65    116.7±2.04µs        ? ?/sec    1.00     70.9±1.20µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, no NULLs                   1.50     32.8±0.42µs        ? ?/sec    1.00     21.9±0.31µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, mandatory, no NULLs            1.13     51.2±0.57µs        ? ?/sec    1.00     45.3±0.47µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, half NULLs           1.42     80.1±1.23µs        ? ?/sec    1.00     56.4±0.83µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, no NULLs             1.13     53.1±0.77µs        ? ?/sec    1.00     47.0±0.54µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, mandatory, no NULLs                 1.18     72.7±0.75µs        ? ?/sec    1.00     61.7±0.66µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, half NULLs                1.49    139.9±2.34µs        ? ?/sec    1.00     94.2±1.57µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, no NULLs                  1.17     76.6±3.02µs        ? ?/sec    1.00     65.3±0.82µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, mandatory, no NULLs     1.58     30.3±0.39µs        ? ?/sec    1.00     19.1±0.39µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, half NULLs    1.64    116.5±1.78µs        ? ?/sec    1.00     71.1±1.25µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, no NULLs      1.55     34.3±0.40µs        ? ?/sec    1.00     22.1±0.28µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, mandatory, no NULLs            1.24     60.9±0.62µs        ? ?/sec    1.00     48.9±0.63µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, half NULLs           1.52    134.7±1.79µs        ? ?/sec    1.00     88.8±1.25µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, no NULLs             1.22     64.2±0.70µs        ? ?/sec    1.00     52.6±0.57µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, mandatory, no NULLs                 1.61     29.7±0.46µs        ? ?/sec    1.00     18.5±0.22µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, half NULLs                1.64    116.8±1.86µs        ? ?/sec    1.00     71.1±1.08µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, no NULLs                  1.51     32.8±0.42µs        ? ?/sec    1.00     21.7±0.38µs        ? ?/sec
full details
group                                                                             55_0                                   cast_int
-----                                                                             ----                                   --------
arrow_array_reader/Int16Array/binary packed skip, mandatory, no NULLs             1.14     47.2±0.54µs        ? ?/sec    1.00     41.5±0.78µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, half NULLs            1.44     77.6±1.00µs        ? ?/sec    1.00     54.0±0.77µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, no NULLs              1.13     49.0±0.50µs        ? ?/sec    1.00     43.3±0.59µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, mandatory, no NULLs                  1.20     67.5±0.79µs        ? ?/sec    1.00     56.0±0.59µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, half NULLs                 1.51    136.1±2.29µs        ? ?/sec    1.00     90.1±1.33µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, no NULLs                   1.17     70.7±0.92µs        ? ?/sec    1.00     60.6±0.92µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, mandatory, no NULLs      1.53     30.7±0.54µs        ? ?/sec    1.00     20.1±0.24µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, half NULLs     1.65    116.6±2.29µs        ? ?/sec    1.00     70.9±0.88µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, no NULLs       1.44     33.5±0.41µs        ? ?/sec    1.00     23.2±0.25µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, mandatory, no NULLs             1.25     61.0±1.06µs        ? ?/sec    1.00     48.7±0.93µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, half NULLs            1.52    134.2±2.09µs        ? ?/sec    1.00     88.5±1.44µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, no NULLs              1.23     64.3±1.10µs        ? ?/sec    1.00     52.1±0.64µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, mandatory, no NULLs                  1.62     30.2±0.61µs        ? ?/sec    1.00     18.7±0.23µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, half NULLs                 1.65    116.7±2.04µs        ? ?/sec    1.00     70.9±1.20µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, no NULLs                   1.50     32.8±0.42µs        ? ?/sec    1.00     21.9±0.31µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, mandatory, no NULLs             1.00     40.9±0.72µs        ? ?/sec    1.00     40.9±0.61µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, half NULLs            1.01     53.1±0.71µs        ? ?/sec    1.00     52.7±0.66µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, no NULLs              1.00     42.9±0.54µs        ? ?/sec    1.00     42.8±0.53µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs                  1.00     53.1±1.13µs        ? ?/sec    1.00     53.1±0.60µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, half NULLs                 1.00     86.7±1.05µs        ? ?/sec    1.00     87.1±0.91µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, no NULLs                   1.00     56.9±0.69µs        ? ?/sec    1.00     57.0±0.91µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, mandatory, no NULLs      1.00     13.7±0.18µs        ? ?/sec    1.01     13.8±0.21µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, half NULLs     1.03     66.4±0.87µs        ? ?/sec    1.00     64.6±0.77µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, no NULLs       1.00     17.1±0.19µs        ? ?/sec    1.00     17.0±0.28µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, mandatory, no NULLs             1.01     43.1±0.93µs        ? ?/sec    1.00     42.8±0.43µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, half NULLs            1.00     83.2±1.10µs        ? ?/sec    1.00     83.5±1.05µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, no NULLs              1.00     46.9±0.50µs        ? ?/sec    1.00     46.9±0.70µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, mandatory, no NULLs                  1.00      9.6±0.11µs        ? ?/sec    1.05     10.1±0.14µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, half NULLs                 1.02     64.0±0.92µs        ? ?/sec    1.00     63.0±0.62µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, no NULLs                   1.00     13.5±0.19µs        ? ?/sec    1.01     13.7±0.18µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, mandatory, no NULLs             1.00     44.7±0.56µs        ? ?/sec    1.01     45.1±0.71µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, half NULLs            1.01     55.4±0.61µs        ? ?/sec    1.00     54.5±0.59µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, no NULLs              1.01     46.7±0.87µs        ? ?/sec    1.00     46.1±0.71µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs                  1.00     61.0±0.61µs        ? ?/sec    1.02     62.3±0.85µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, half NULLs                 1.00     91.1±0.98µs        ? ?/sec    1.06     96.6±1.43µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, no NULLs                   1.00     65.1±0.99µs        ? ?/sec    1.01     65.6±0.93µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, mandatory, no NULLs      1.00     64.2±0.65µs        ? ?/sec    1.00     64.0±0.72µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, half NULLs     1.00     95.3±1.26µs        ? ?/sec    1.00     95.3±1.16µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, no NULLs       1.00     67.3±0.70µs        ? ?/sec    1.00     67.5±0.77µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, mandatory, no NULLs             1.01     48.4±0.59µs        ? ?/sec    1.00     47.9±0.52µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, half NULLs            1.00     85.6±0.95µs        ? ?/sec    1.00     85.2±1.05µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, no NULLs              1.00     51.7±0.83µs        ? ?/sec    1.00     51.6±0.94µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, mandatory, no NULLs                  1.01     25.9±0.61µs        ? ?/sec    1.00     25.6±0.46µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, half NULLs                 1.01     74.4±0.94µs        ? ?/sec    1.00     73.4±0.91µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, no NULLs                   1.00     29.2±0.66µs        ? ?/sec    1.00     29.2±0.43µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, mandatory, no NULLs              1.13     47.2±0.59µs        ? ?/sec    1.00     41.6±0.60µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, half NULLs             1.44     78.1±1.06µs        ? ?/sec    1.00     54.1±0.72µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, no NULLs               1.11     48.5±0.64µs        ? ?/sec    1.00     43.8±0.61µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, mandatory, no NULLs                   1.18     66.2±0.75µs        ? ?/sec    1.00     56.3±0.84µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, half NULLs                  1.51    138.1±1.33µs        ? ?/sec    1.00     91.5±1.38µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, no NULLs                    1.15     69.7±0.87µs        ? ?/sec    1.00     60.7±1.14µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, mandatory, no NULLs       1.42     29.8±0.42µs        ? ?/sec    1.00     20.9±0.24µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, half NULLs      1.61    118.0±1.28µs        ? ?/sec    1.00     73.4±1.38µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, no NULLs        1.35     32.6±0.38µs        ? ?/sec    1.00     24.1±0.29µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, mandatory, no NULLs              1.19     59.1±0.72µs        ? ?/sec    1.00     49.8±0.48µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, half NULLs             1.50    135.4±1.44µs        ? ?/sec    1.00     90.4±1.41µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, no NULLs               1.17     62.4±0.62µs        ? ?/sec    1.00     53.2±0.92µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, mandatory, no NULLs                   1.49     28.6±0.35µs        ? ?/sec    1.00     19.2±0.29µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, half NULLs                  1.65    117.7±1.18µs        ? ?/sec    1.00     71.3±0.92µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, no NULLs                    1.43     32.3±0.86µs        ? ?/sec    1.00     22.6±0.35µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, mandatory, no NULLs            1.13     51.2±0.57µs        ? ?/sec    1.00     45.3±0.47µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, half NULLs           1.42     80.1±1.23µs        ? ?/sec    1.00     56.4±0.83µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, no NULLs             1.13     53.1±0.77µs        ? ?/sec    1.00     47.0±0.54µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, mandatory, no NULLs                 1.18     72.7±0.75µs        ? ?/sec    1.00     61.7±0.66µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, half NULLs                1.49    139.9±2.34µs        ? ?/sec    1.00     94.2±1.57µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, no NULLs                  1.17     76.6±3.02µs        ? ?/sec    1.00     65.3±0.82µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, mandatory, no NULLs     1.58     30.3±0.39µs        ? ?/sec    1.00     19.1±0.39µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, half NULLs    1.64    116.5±1.78µs        ? ?/sec    1.00     71.1±1.25µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, no NULLs      1.55     34.3±0.40µs        ? ?/sec    1.00     22.1±0.28µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, mandatory, no NULLs            1.24     60.9±0.62µs        ? ?/sec    1.00     48.9±0.63µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, half NULLs           1.52    134.7±1.79µs        ? ?/sec    1.00     88.8±1.25µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, no NULLs             1.22     64.2±0.70µs        ? ?/sec    1.00     52.6±0.57µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, mandatory, no NULLs                 1.61     29.7±0.46µs        ? ?/sec    1.00     18.5±0.22µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, half NULLs                1.64    116.8±1.86µs        ? ?/sec    1.00     71.1±1.08µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, no NULLs                  1.51     32.8±0.42µs        ? ?/sec    1.00     21.7±0.38µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, mandatory, no NULLs            1.00     40.8±0.58µs        ? ?/sec    1.01     41.1±0.45µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, half NULLs           1.02     52.7±1.04µs        ? ?/sec    1.00     51.9±0.54µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, no NULLs             1.01     42.7±0.46µs        ? ?/sec    1.00     42.4±0.59µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, mandatory, no NULLs                 1.00     53.5±0.76µs        ? ?/sec    1.02     54.4±0.99µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, half NULLs                1.00     86.5±1.22µs        ? ?/sec    1.00     86.4±1.51µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, no NULLs                  1.00     56.7±0.70µs        ? ?/sec    1.00     56.7±0.85µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, mandatory, no NULLs     1.00     12.5±0.14µs        ? ?/sec    1.01     12.6±0.23µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, half NULLs    1.02     65.0±0.65µs        ? ?/sec    1.00     64.1±0.83µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, no NULLs      1.01     15.8±0.20µs        ? ?/sec    1.00     15.7±0.16µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, mandatory, no NULLs            1.00     42.9±0.57µs        ? ?/sec    1.00     42.9±0.56µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, half NULLs           1.01     83.8±1.66µs        ? ?/sec    1.00     82.8±0.82µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, no NULLs             1.01     46.6±0.59µs        ? ?/sec    1.00     46.3±0.59µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, mandatory, no NULLs                 1.00     11.7±0.15µs        ? ?/sec    1.04     12.1±0.14µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, half NULLs                1.01     64.4±0.67µs        ? ?/sec    1.00     64.0±0.80µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, no NULLs                  1.00     14.9±0.18µs        ? ?/sec    1.03     15.3±0.25µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, mandatory, no NULLs            1.00     44.7±0.55µs        ? ?/sec    1.02     45.6±0.57µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, half NULLs           1.00     54.8±0.79µs        ? ?/sec    1.00     54.7±0.61µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, no NULLs             1.01     46.7±0.74µs        ? ?/sec    1.00     46.3±0.78µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, mandatory, no NULLs                 1.00     61.0±1.11µs        ? ?/sec    1.02     62.4±0.91µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, half NULLs                1.00     95.7±1.26µs        ? ?/sec    1.01     97.2±1.49µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, no NULLs                  1.00     65.3±0.84µs        ? ?/sec    1.00     65.4±0.95µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, mandatory, no NULLs     1.00     64.1±0.87µs        ? ?/sec    1.00     63.8±0.82µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, half NULLs    1.01     95.4±0.99µs        ? ?/sec    1.00     94.6±1.43µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, no NULLs      1.00     67.2±0.78µs        ? ?/sec    1.00     67.2±0.84µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, mandatory, no NULLs            1.04     49.0±0.65µs        ? ?/sec    1.00     47.1±0.49µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, half NULLs           1.00     85.9±1.29µs        ? ?/sec    1.00     85.5±0.93µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, no NULLs             1.03     51.8±0.57µs        ? ?/sec    1.00     50.2±0.63µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, mandatory, no NULLs                 1.04     26.5±0.34µs        ? ?/sec    1.00     25.4±0.44µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, half NULLs                1.02     74.6±1.43µs        ? ?/sec    1.00     73.4±0.88µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, no NULLs                  1.00     29.0±0.37µs        ? ?/sec    1.00     28.9±0.47µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, mandatory, no NULLs             1.10     49.4±0.55µs        ? ?/sec    1.00     45.0±0.85µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, half NULLs            1.39     78.2±0.80µs        ? ?/sec    1.00     56.1±0.61µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, no NULLs              1.06     51.6±0.66µs        ? ?/sec    1.00     48.6±6.94µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, mandatory, no NULLs                  1.15     70.2±0.81µs        ? ?/sec    1.00     61.2±0.85µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, half NULLs                 1.48    138.2±2.01µs        ? ?/sec    1.00     93.1±1.14µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, no NULLs                   1.14     73.5±1.20µs        ? ?/sec    1.00     64.7±1.06µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, mandatory, no NULLs      1.52     30.7±0.36µs        ? ?/sec    1.00     20.2±0.25µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, half NULLs     1.61    116.4±1.57µs        ? ?/sec    1.00     72.3±1.13µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, no NULLs       1.35     33.6±0.80µs        ? ?/sec    1.00     24.8±0.35µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, mandatory, no NULLs             1.14     59.1±0.53µs        ? ?/sec    1.00     52.0±7.15µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, half NULLs            1.49    134.2±2.16µs        ? ?/sec    1.00     89.9±1.02µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, no NULLs              1.17     62.4±0.79µs        ? ?/sec    1.00     53.2±0.78µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, mandatory, no NULLs                  1.51     28.7±0.48µs        ? ?/sec    1.00     19.0±0.24µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, half NULLs                 1.66    117.0±2.70µs        ? ?/sec    1.00     70.5±0.78µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, no NULLs                   1.43     31.9±0.58µs        ? ?/sec    1.00     22.3±0.35µs        ? ?/sec

@etseidl etseidl marked this pull request as ready for review April 10, 2025 22:44
@etseidl etseidl changed the title Allow for reading improperly encoded UINT_8 and UINT_16 Parquet data Improve performance of reading int8/int16 Parquet data Apr 10, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @etseidl -- These are some nice speedups ❤️

I pulled the benchmarks out into a separate PR so I can re-run the benchmarks so I can confirm the results. This PR is looking very nice

@@ -1280,6 +1292,18 @@ fn add_benches(c: &mut Criterion) {
let string_list_desc = schema.column(14);
let mandatory_binary_column_desc = schema.column(15);
let optional_binary_column_desc = schema.column(16);
let mandatory_uint8_column_desc = schema.column(27);
Copy link
Contributor

@alamb alamb May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own convenience I pulled the benchmark into its own PR to make it easier to compare this branch to main:

@alamb
Copy link
Contributor

alamb commented May 8, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fix_uint_cast (b7a9167) to 0e48877 diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --all-features --bench arrow_reader
BENCH_FILTER=
BENCH_BRANCH_NAME=fix_uint_cast
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented May 8, 2025

🤖: Benchmark completed

Details

group                                                                                                      fix_uint_cast                          main
-----                                                                                                      -------------                          ----
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                           1.00   1294.2±2.96µs        ? ?/sec    1.00   1296.3±3.74µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                          1.00   1370.8±3.50µs        ? ?/sec    1.03   1411.0±9.48µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                            1.00   1301.0±3.16µs        ? ?/sec    1.00   1303.5±2.38µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs                                     1.05    624.4±3.56µs        ? ?/sec    1.00    592.9±3.45µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs                                    1.02    794.3±2.53µs        ? ?/sec    1.00    775.2±2.40µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs                                      1.03    614.7±3.40µs        ? ?/sec    1.00    597.5±3.20µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs                                          1.00    732.6±2.14µs        ? ?/sec    1.01    739.0±1.95µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs                                         1.01    891.5±6.82µs        ? ?/sec    1.00    882.7±2.38µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs                                           1.00    744.5±3.21µs        ? ?/sec    1.00    745.8±2.73µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs                                 1.04    510.5±6.73µs        ? ?/sec    1.00    490.8±0.90µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs                                1.01    639.8±1.87µs        ? ?/sec    1.00    635.5±1.39µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs                                  1.04    515.4±1.14µs        ? ?/sec    1.00    496.7±5.30µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs                                      1.01    576.8±1.48µs        ? ?/sec    1.00    570.3±6.20µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short string                        1.01    500.1±0.69µs        ? ?/sec    1.00    493.4±0.69µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs                                     1.00    683.1±1.55µs        ? ?/sec    1.00    684.1±1.75µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs                                       1.01    588.3±5.20µs        ? ?/sec    1.00   583.5±12.18µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs     1.00    943.2±3.22µs        ? ?/sec    1.00    938.7±1.92µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, half NULLs    1.00    747.4±3.35µs        ? ?/sec    1.02    764.1±2.50µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, no NULLs      1.00    949.9±2.32µs        ? ?/sec    1.00    946.8±1.46µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                 1.04    278.9±2.92µs        ? ?/sec    1.00    268.0±4.27µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                1.00    423.5±1.54µs        ? ?/sec    1.03    435.9±2.74µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                  1.03    284.4±2.80µs        ? ?/sec    1.00    276.5±4.28µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, mandatory, no NULLs        1.00    123.1±0.24µs        ? ?/sec    1.00    122.9±0.29µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, half NULLs       1.02    214.3±2.26µs        ? ?/sec    1.00    210.4±0.98µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, no NULLs         1.00    128.3±0.24µs        ? ?/sec    1.00    128.4±0.54µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, mandatory, no NULLs                    1.04     40.4±0.11µs        ? ?/sec    1.00     38.9±0.11µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, half NULLs                   1.01    170.7±0.34µs        ? ?/sec    1.00    168.4±0.56µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, no NULLs                     1.01     43.8±0.12µs        ? ?/sec    1.00     43.2±0.14µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, mandatory, no NULLs                    1.00    735.7±1.55µs        ? ?/sec    1.00    733.9±1.99µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, half NULLs                   1.00    551.1±2.14µs        ? ?/sec    1.01    556.9±4.99µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, no NULLs                     1.00    741.9±2.32µs        ? ?/sec    1.00    740.5±1.97µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, mandatory, no NULLs                                1.15     66.8±2.70µs        ? ?/sec    1.00     58.0±5.78µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, half NULLs                               1.00    213.7±1.79µs        ? ?/sec    1.04    221.7±1.56µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, no NULLs                                 1.10     72.4±3.33µs        ? ?/sec    1.00     66.1±7.23µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, mandatory, no NULLs                     1.01     95.0±0.30µs        ? ?/sec    1.00     94.4±0.25µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, half NULLs                    1.00    184.5±0.44µs        ? ?/sec    1.00    183.6±0.50µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, no NULLs                      1.00    100.3±0.27µs        ? ?/sec    1.00    100.1±0.25µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, mandatory, no NULLs                                 1.00      9.5±0.20µs        ? ?/sec    1.02      9.7±0.24µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, half NULLs                                1.02    142.4±1.21µs        ? ?/sec    1.00    140.2±0.57µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, no NULLs                                  1.00     14.6±0.19µs        ? ?/sec    1.03     15.1±0.16µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, mandatory, no NULLs                     1.00    184.2±0.57µs        ? ?/sec    1.00    184.2±0.42µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, half NULLs                    1.00    279.2±0.60µs        ? ?/sec    1.00    278.8±0.80µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, no NULLs                      1.00    189.7±0.49µs        ? ?/sec    1.00    190.5±0.60µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, mandatory, no NULLs                                 1.07     14.7±0.39µs        ? ?/sec    1.00     13.8±0.21µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, half NULLs                                1.01    194.4±1.20µs        ? ?/sec    1.00    193.3±0.57µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, no NULLs                                  1.03     21.0±0.28µs        ? ?/sec    1.00     20.4±0.39µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, mandatory, no NULLs                     1.00    365.4±1.25µs        ? ?/sec    1.00    366.5±1.09µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, half NULLs                    1.00    350.2±0.96µs        ? ?/sec    1.02    357.2±0.80µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, no NULLs                      1.00    371.7±1.65µs        ? ?/sec    1.00    372.4±0.61µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, mandatory, no NULLs                                 1.11     27.6±0.41µs        ? ?/sec    1.00     25.0±0.40µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, half NULLs                                1.00    183.0±0.59µs        ? ?/sec    1.02    187.4±0.61µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, no NULLs                                  1.07     34.4±0.37µs        ? ?/sec    1.00     32.2±0.30µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.00    118.4±0.22µs        ? ?/sec    1.11    131.9±0.48µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, half NULLs                          1.01    143.5±0.36µs        ? ?/sec    1.00    141.6±0.29µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, no NULLs                            1.00    121.5±0.22µs        ? ?/sec    1.11    134.3±0.36µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, mandatory, no NULLs                                1.00    171.5±0.24µs        ? ?/sec    1.08    184.9±1.51µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, half NULLs                               1.02    244.6±0.61µs        ? ?/sec    1.00    240.3±0.79µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, no NULLs                                 1.00    176.7±0.65µs        ? ?/sec    1.08    190.9±0.48µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00     76.9±0.18µs        ? ?/sec    1.01     77.7±0.15µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.08    194.4±0.42µs        ? ?/sec    1.00    180.4±0.61µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.00     82.5±0.19µs        ? ?/sec    1.01     83.2±0.33µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    140.5±0.34µs        ? ?/sec    1.04    145.8±0.42µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, half NULLs                          1.04    228.7±0.59µs        ? ?/sec    1.00    220.6±2.76µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    146.3±0.48µs        ? ?/sec    1.04    152.9±0.37µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, mandatory, no NULLs                                1.02     76.1±0.40µs        ? ?/sec    1.00     74.5±0.52µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, half NULLs                               1.07    191.6±0.50µs        ? ?/sec    1.00    179.0±0.60µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, no NULLs                                 1.00     78.5±0.45µs        ? ?/sec    1.00     78.8±0.28µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.01    112.9±0.50µs        ? ?/sec    1.00    111.9±0.68µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, half NULLs                          1.00    134.7±0.22µs        ? ?/sec    1.03    138.7±0.50µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, no NULLs                            1.01    114.7±0.50µs        ? ?/sec    1.00    113.2±0.34µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, mandatory, no NULLs                                1.04    170.4±0.54µs        ? ?/sec    1.00    163.1±0.53µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, half NULLs                               1.00    239.9±0.69µs        ? ?/sec    1.03    246.4±0.73µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, no NULLs                                 1.04    175.6±1.50µs        ? ?/sec    1.00    168.8±0.64µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00    205.2±0.53µs        ? ?/sec    1.00    205.6±0.37µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.00    254.7±0.77µs        ? ?/sec    1.03    263.4±0.68µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.00    212.3±0.89µs        ? ?/sec    1.00    212.1±0.51µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    146.3±0.39µs        ? ?/sec    1.04    151.6±0.36µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, half NULLs                          1.00    230.0±2.31µs        ? ?/sec    1.05    242.0±0.75µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    154.4±0.49µs        ? ?/sec    1.04    160.5±1.37µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, mandatory, no NULLs                                1.04    109.0±0.62µs        ? ?/sec    1.00    104.6±1.07µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, half NULLs                               1.00    201.0±0.61µs        ? ?/sec    1.05    210.4±0.62µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, no NULLs                                 1.00    114.9±1.74µs        ? ?/sec    1.00    115.4±1.15µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, mandatory, no NULLs                                      1.00    101.8±0.25µs        ? ?/sec    1.20    122.0±0.25µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, half NULLs                                     1.00    122.4±0.51µs        ? ?/sec    1.39    169.9±2.06µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, no NULLs                                       1.00    105.0±0.28µs        ? ?/sec    1.19    124.6±0.21µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, mandatory, no NULLs                                           1.00    139.0±0.42µs        ? ?/sec    1.25    173.7±0.47µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, half NULLs                                          1.00    206.9±0.47µs        ? ?/sec    1.42    293.1±0.58µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, no NULLs                                            1.00    144.5±0.69µs        ? ?/sec    1.23    177.6±0.52µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     45.7±0.14µs        ? ?/sec    1.41     64.2±0.24µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, half NULLs                              1.00    158.8±1.03µs        ? ?/sec    1.50    237.5±0.65µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, no NULLs                                1.00     50.4±0.17µs        ? ?/sec    1.36     68.7±0.24µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, mandatory, no NULLs                                      1.00    107.4±0.31µs        ? ?/sec    1.23    132.5±0.30µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, half NULLs                                     1.00    193.1±0.34µs        ? ?/sec    1.43    275.4±1.08µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, no NULLs                                       1.00    113.5±0.27µs        ? ?/sec    1.21    137.5±0.24µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, mandatory, no NULLs                                           1.00     39.4±0.13µs        ? ?/sec    1.47     57.8±0.28µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, half NULLs                                          1.00    155.9±0.43µs        ? ?/sec    1.51    234.9±0.48µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, no NULLs                                            1.00     44.9±0.15µs        ? ?/sec    1.41     63.3±0.22µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, mandatory, no NULLs                                      1.00     90.7±0.16µs        ? ?/sec    1.14    103.7±0.36µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, half NULLs                                     1.00    113.9±0.22µs        ? ?/sec    1.01    114.7±0.34µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, no NULLs                                       1.00     93.7±0.13µs        ? ?/sec    1.13    106.2±0.23µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs                                           1.00    119.2±0.23µs        ? ?/sec    1.11    132.0±0.56µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, half NULLs                                          1.01    188.1±0.39µs        ? ?/sec    1.00    185.9±0.82µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, no NULLs                                            1.00    124.4±0.39µs        ? ?/sec    1.10    136.6±0.23µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, mandatory, no NULLs                               1.02     27.4±0.44µs        ? ?/sec    1.00     26.9±0.26µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, half NULLs                              1.09    141.7±0.32µs        ? ?/sec    1.00    129.4±0.55µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, no NULLs                                1.04     32.3±0.25µs        ? ?/sec    1.00     31.2±0.29µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, mandatory, no NULLs                                      1.00     86.4±0.19µs        ? ?/sec    1.08     93.6±0.27µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, half NULLs                                     1.06    176.1±0.66µs        ? ?/sec    1.00    165.8±0.36µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, no NULLs                                       1.00     94.9±0.24µs        ? ?/sec    1.03     98.0±0.37µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, mandatory, no NULLs                                           1.00     18.6±0.37µs        ? ?/sec    1.00     18.6±0.83µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, half NULLs                                          1.10    135.2±0.28µs        ? ?/sec    1.00    123.3±0.25µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, no NULLs                                            1.08     26.5±0.72µs        ? ?/sec    1.00     24.5±0.44µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, mandatory, no NULLs                                      1.03     85.0±0.54µs        ? ?/sec    1.00     82.8±0.34µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, half NULLs                                     1.00    107.4±0.35µs        ? ?/sec    1.04    111.4±0.43µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, no NULLs                                       1.03     88.0±0.41µs        ? ?/sec    1.00     85.6±0.24µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs                                           1.07    116.9±0.54µs        ? ?/sec    1.00    109.7±0.52µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, half NULLs                                          1.00    179.2±0.60µs        ? ?/sec    1.03    184.0±0.97µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, no NULLs                                            1.07    120.8±0.59µs        ? ?/sec    1.00    112.8±0.59µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, mandatory, no NULLs                               1.01    151.9±0.34µs        ? ?/sec    1.00    149.7±0.30µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, half NULLs                              1.00    200.2±0.78µs        ? ?/sec    1.04    208.9±0.52µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, no NULLs                                1.02    157.7±0.27µs        ? ?/sec    1.00    154.7±0.81µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, mandatory, no NULLs                                      1.00     94.0±0.60µs        ? ?/sec    1.06     99.8±0.52µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, half NULLs                                     1.00    175.5±0.68µs        ? ?/sec    1.03    180.4±1.47µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, no NULLs                                       1.00     99.5±0.55µs        ? ?/sec    1.05    104.4±0.62µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, mandatory, no NULLs                                           1.03     47.9±1.97µs        ? ?/sec    1.00     46.7±2.35µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, half NULLs                                          1.00    142.1±0.95µs        ? ?/sec    1.07    152.1±0.96µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, no NULLs                                            1.11     57.6±3.16µs        ? ?/sec    1.00     52.1±2.81µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, mandatory, no NULLs                                       1.00     91.0±0.81µs        ? ?/sec    1.35    122.4±1.31µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, half NULLs                                      1.00    116.4±0.21µs        ? ?/sec    1.46    170.2±1.06µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, no NULLs                                        1.00     93.9±0.18µs        ? ?/sec    1.33    125.0±0.31µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, mandatory, no NULLs                                            1.00    121.6±0.23µs        ? ?/sec    1.43    174.1±0.42µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, half NULLs                                           1.00    194.6±0.49µs        ? ?/sec    1.52    295.3±0.82µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, no NULLs                                             1.00    126.5±0.27µs        ? ?/sec    1.41    178.6±0.51µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, mandatory, no NULLs                                1.00     35.8±0.09µs        ? ?/sec    2.00     71.4±0.22µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, half NULLs                               1.00    150.6±1.13µs        ? ?/sec    1.61    242.3±0.97µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, no NULLs                                 1.00     41.0±0.13µs        ? ?/sec    1.87     76.8±0.60µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, mandatory, no NULLs                                       1.00     99.6±0.93µs        ? ?/sec    1.41    140.3±0.27µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, half NULLs                                      1.00    184.1±0.35µs        ? ?/sec    1.52    279.2±1.09µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, no NULLs                                        1.00    105.7±0.32µs        ? ?/sec    1.37    145.1±0.35µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, mandatory, no NULLs                                            1.00     31.4±0.12µs        ? ?/sec    2.08     65.5±0.26µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, half NULLs                                           1.00    147.2±0.31µs        ? ?/sec    1.62    238.6±0.61µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, no NULLs                                             1.00     36.6±0.15µs        ? ?/sec    1.91     70.0±0.19µs        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings half NULLs                                     1.03      9.1±0.04ms        ? ?/sec    1.00      8.9±0.11ms        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings no NULLs                                       1.00     17.5±0.17ms        ? ?/sec    1.00     17.5±0.18ms        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, mandatory, no NULLs                                     1.02    695.0±1.37µs        ? ?/sec    1.00    680.2±1.31µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, half NULLs                                    1.04    873.6±4.20µs        ? ?/sec    1.00    837.8±1.24µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, no NULLs                                      1.02    701.3±1.43µs        ? ?/sec    1.00    687.0±5.06µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, mandatory, no NULLs                                          1.02    985.6±5.57µs        ? ?/sec    1.00    967.3±3.01µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, half NULLs                                         1.03   1058.2±3.19µs        ? ?/sec    1.00   1031.5±4.44µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, no NULLs                                           1.02   992.3±14.38µs        ? ?/sec    1.00    975.5±3.87µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, mandatory, no NULLs                                1.00    435.9±0.97µs        ? ?/sec    1.01    441.6±1.29µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, half NULLs                               1.00    903.5±2.40µs        ? ?/sec    1.02    919.0±6.85µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, no NULLs                                 1.00    443.3±1.27µs        ? ?/sec    1.01    448.0±1.54µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs                                 1.02   1987.9±3.72µs        ? ?/sec    1.00   1954.8±2.64µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs                                1.01   1843.7±2.61µs        ? ?/sec    1.00   1821.1±3.08µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs                                  1.04      2.0±0.00ms        ? ?/sec    1.00   1964.5±4.28µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs                                      1.02   1440.2±9.68µs        ? ?/sec    1.00   1406.7±4.82µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs                                     1.01   1626.0±4.57µs        ? ?/sec    1.00   1612.2±4.59µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs                                       1.02   1438.9±3.46µs        ? ?/sec    1.00   1404.3±4.13µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, mandatory, no NULLs                                     1.00    101.6±0.57µs        ? ?/sec    1.20    121.5±0.21µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, half NULLs                                    1.00    126.2±0.39µs        ? ?/sec    1.37    173.2±0.57µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, no NULLs                                      1.00    104.5±0.23µs        ? ?/sec    1.19    124.1±0.30µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, mandatory, no NULLs                                          1.00    138.8±0.60µs        ? ?/sec    1.21    168.3±0.23µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, half NULLs                                         1.00    209.2±0.45µs        ? ?/sec    1.43    298.5±0.64µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, no NULLs                                           1.00    143.5±0.35µs        ? ?/sec    1.20    172.4±0.33µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, mandatory, no NULLs                              1.00     45.8±0.26µs        ? ?/sec    1.29     59.0±0.24µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, half NULLs                             1.00    157.0±0.40µs        ? ?/sec    1.52    238.6±2.12µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, no NULLs                               1.00     50.6±0.22µs        ? ?/sec    1.25     63.4±0.80µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, mandatory, no NULLs                                     1.00    107.2±0.29µs        ? ?/sec    1.19    127.5±0.48µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, half NULLs                                    1.00    191.7±0.47µs        ? ?/sec    1.44    276.5±0.57µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, no NULLs                                      1.00    113.3±0.30µs        ? ?/sec    1.17    132.4±1.70µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, mandatory, no NULLs                                          1.00     39.3±0.17µs        ? ?/sec    1.35     52.9±0.34µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, half NULLs                                         1.00    156.4±0.49µs        ? ?/sec    1.50    235.3±1.47µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, no NULLs                                           1.00     44.8±0.14µs        ? ?/sec    1.30     58.2±0.38µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, mandatory, no NULLs                                     1.00     92.9±0.39µs        ? ?/sec    1.13    105.4±0.46µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, half NULLs                                    1.01    116.6±0.24µs        ? ?/sec    1.00    115.8±0.22µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, no NULLs                                      1.00     95.3±0.21µs        ? ?/sec    1.14    108.3±0.43µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, mandatory, no NULLs                                          1.00    120.9±0.41µs        ? ?/sec    1.11    133.7±0.38µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, half NULLs                                         1.03    192.8±0.51µs        ? ?/sec    1.00    187.0±0.56µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, no NULLs                                           1.00    126.2±0.50µs        ? ?/sec    1.10    138.7±0.61µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, mandatory, no NULLs                              1.10     27.5±0.34µs        ? ?/sec    1.00     25.1±0.22µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, half NULLs                             1.09    142.3±0.87µs        ? ?/sec    1.00    130.9±0.73µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, no NULLs                               1.02     32.3±0.29µs        ? ?/sec    1.00     31.6±0.25µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, mandatory, no NULLs                                     1.00     89.5±0.45µs        ? ?/sec    1.07     95.6±0.34µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, half NULLs                                    1.06    177.3±6.26µs        ? ?/sec    1.00    166.6±0.41µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, no NULLs                                      1.00     95.3±0.48µs        ? ?/sec    1.05    100.2±0.25µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, mandatory, no NULLs                                          1.01     21.6±0.42µs        ? ?/sec    1.00     21.3±0.46µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, half NULLs                                         1.10    138.8±0.51µs        ? ?/sec    1.00    126.4±0.28µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, no NULLs                                           1.05     27.6±0.75µs        ? ?/sec    1.00     26.3±0.43µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, mandatory, no NULLs                                     1.03     85.2±0.27µs        ? ?/sec    1.00     83.2±0.52µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, half NULLs                                    1.00    107.5±0.37µs        ? ?/sec    1.04    111.7±0.38µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, no NULLs                                      1.03     88.0±0.30µs        ? ?/sec    1.00     85.5±0.39µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, mandatory, no NULLs                                          1.07    117.6±0.72µs        ? ?/sec    1.00    110.1±0.63µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, half NULLs                                         1.00    179.4±0.73µs        ? ?/sec    1.02    183.4±0.30µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, no NULLs                                           1.08    120.9±0.69µs        ? ?/sec    1.00    112.3±0.59µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, mandatory, no NULLs                              1.00    150.4±0.43µs        ? ?/sec    1.01    152.0±0.55µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, half NULLs                             1.00    200.0±2.28µs        ? ?/sec    1.05    210.1±1.13µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, no NULLs                               1.00    156.0±0.43µs        ? ?/sec    1.01    157.0±0.37µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, mandatory, no NULLs                                     1.00     94.1±0.67µs        ? ?/sec    1.07    100.7±0.57µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, half NULLs                                    1.00    168.2±0.48µs        ? ?/sec    1.12    188.5±0.88µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, no NULLs                                      1.00     99.7±1.19µs        ? ?/sec    1.06    105.2±1.06µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, mandatory, no NULLs                                          1.09     48.2±2.17µs        ? ?/sec    1.00     44.3±1.27µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, half NULLs                                         1.00    142.6±0.36µs        ? ?/sec    1.06    151.8±0.67µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, no NULLs                                           1.03     54.1±2.82µs        ? ?/sec    1.00     52.4±2.11µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, mandatory, no NULLs                                      1.00    101.1±0.17µs        ? ?/sec    1.21    122.3±0.44µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, half NULLs                                     1.00    122.2±0.22µs        ? ?/sec    1.42    173.9±0.85µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, no NULLs                                       1.00    104.0±0.20µs        ? ?/sec    1.20    125.1±1.13µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, mandatory, no NULLs                                           1.00    130.1±0.18µs        ? ?/sec    1.27    165.8±0.47µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, half NULLs                                          1.00    201.9±0.39µs        ? ?/sec    1.47    297.5±0.78µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, no NULLs                                            1.00    136.1±1.37µs        ? ?/sec    1.26    171.0±1.54µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     35.9±0.13µs        ? ?/sec    1.62     58.1±0.17µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, half NULLs                              1.00    150.6±0.38µs        ? ?/sec    1.61    242.1±0.50µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, no NULLs                                1.00     41.3±0.16µs        ? ?/sec    1.52     62.5±0.24µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, mandatory, no NULLs                                      1.00     99.5±0.31µs        ? ?/sec    1.27    126.0±0.32µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, half NULLs                                     1.00    185.0±0.42µs        ? ?/sec    1.50    278.2±0.69µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, no NULLs                                       1.00    105.9±0.21µs        ? ?/sec    1.24    130.9±0.35µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, mandatory, no NULLs                                           1.00     31.5±0.11µs        ? ?/sec    1.65     52.2±0.25µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, half NULLs                                          1.00    147.9±1.94µs        ? ?/sec    1.61    237.7±0.66µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, no NULLs                                            1.00     36.8±0.38µs        ? ?/sec    1.54     56.8±0.23µs        ? ?/sec

@parthchandra
Copy link
Contributor

Thank you @etseidl. This is a great speedup!. Also the changed behavior matches that of SerializedFileReader

@alamb
Copy link
Contributor

alamb commented May 9, 2025

These are some pretty sweet speedups. Thanks again @etseidl

@alamb alamb merged commit 9e91ef4 into apache:main May 9, 2025
16 checks passed
@etseidl
Copy link
Contributor Author

etseidl commented May 9, 2025

Thanks for the review @alamb!

@@ -261,6 +262,45 @@ where
// - date64: cast int32 to date32, then date32 to date64.
// - decimal: cast int32 to decimal, int64 to decimal
let array = match target_type {
// Using `arrow_cast::cast` has been found to be very slow for converting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there other conversions that can avoid the type cast? i saw this being expensive in some other benchmarks as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I briefly looked at others and it wasn't super obvious

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what benchmark, btw? I am curious

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several of the clickbench queries (not sure what data types, but it was spending like 20% of samples in casting during parquet reading).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to burrow down into that code again early next week and see if there are any other obvious candidates. I did try signed->unsigned for 32 and 64 bit ints and there was no difference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several of the clickbench queries (not sure what data types, but it was spending like 20% of samples in casting during parquet reading).

FWIW many of the clickbench columns are Int16, as I found when working on #7470.

I started running some benchmarks on a draft update to parquet in this PR (hopefully it will show some improvements)

Copy link
Contributor Author

@etseidl etseidl May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try signed->unsigned for 32 and 64 bit ints and there was no difference.

Ahh, the reason for this is that I32/64->U32/64 is handled above (around L171). I would think anything that falls through and relies on arrow_cast::cast is going to be potentially slow due to use of unary_opt, but a quick glance at the decimal code looks like it will figure out which casts are infallible and use unary instead. Perhaps other conversions do a similar optimization.

It might be worth exploring enumerating all of the allowed Parquet physical to logical type mappings and account for them here and not rely on arrow_cast machinery.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parquet performance: improve performance of reading int8/int16
4 participants