Skip to content

Glommio based IO version for Parquet #5240

@ozgrakkurt

Description

@ozgrakkurt

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I want to read parquet files with less latency/better throughput. Glommio is a thread per core library that utilizes io_uring and direct_io. Direct IO is particularly nice for my case and I image for many other parquet use cases, because my usecase doesn't benefit at all from caching the parquet files in Linux PageCache, it even suffers from it. Also I use fast nvme disks to store the parquet files so direct_io/io_uring can give a big performance boost.

https://itnext.io/modern-storage-is-plenty-fast-it-is-the-apis-that-are-bad-6a68319fbc1a
https://www.phoronix.com/news/OpenZFS-DirectIO-Performance

Describe the solution you'd like
Have a alternative io implementation for parquet similar to the async feature (it uses tokio).

It will implement glommio_reader, glommio_writer etc. similar to async_reader, async_writer etc.

Describe alternatives you've considered
Can open the file with O_DIRECT flag and just read with current async impl or sync impl but then it will crash because of unaligned buffers etc.

Just using glommio should be better since don't need to implement alignment of buffers etc. Also glommio already has io_uring thread per core architecture etc. which is very nice for building database like systems.

Metadata

Metadata

Assignees

No one assigned

    Labels

    development-processRelated to development process of arrow-rsenhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions