Skip to content

Add arrow-ipc benchmarks for the IPC reader and writer #6968

Open
@alamb

Description

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

We are contemplating making the arrow IPC reader/writer faster by allowing the user to opt out of validation but currently have no way to test the validation

Describe the solution you'd like
To make sure this actually improves performance we should have benchmarks for the ipc reader/writer

Describe alternatives you've considered

For benchmarks, what I would recommend is add two new benches:

  • arrow-rs/arrow-ipc/benches/ipc_reader.rs
  • arrow-rs/arrow-ipc/benches/ipc_writer.rs

We can following the existing example from parquet like this:

So someone would run them like

cargo bench --bench ipc_reader

The actual benchmarks I would recommend starting with two sets of data: A record batch with primitive arrays (Int32Array, UInt64 and Float64Array) for example

Then adding tests for

  1. StreamWriter (how fast can the data be serialized to a stream)
  2. FileWriter
  3. StreamReader (how fast can serialized data be read back)
  4. FileReader
    With the basic foundation, we can then

Additional context
Inspired by @totoroyyb

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions