Skip to content

Use Standard Library IO Abstractions in Parquet #1163

@tustvold

Description

@tustvold

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The parquet crate has a number of IO abstractions that at least on the surface appear very similar to those in the Rust standard library.

The major distinction appears to concern mutability, with various components using a mixture of TryClone and RefCell internally to placate the borrow checker. This makes the code fairly hard to reason about, as cloned file descriptors share the same seek position. Additionally it prevents creating readers from &mut File or similar.

At a more holistic level, I'm not really sure of the use-case for non-exclusive IO, but I could be missing something here?

A non-exhaustive list of potential candidates for replacement:

  • Postition - std::io::seek(SeekFrom::Current(0))
  • Length - std::io::seek(SeekFrom::End(0))
  • ChunkReader - std::io::Read
  • SliceableCursor - std::io::Cursor
  • ParquetReader - std::io::Seek + std::io::Read
  • ParquetWriter - std::io::Seek + std::io::Write
  • FileSink - BufWriter<File>
  • FileSource - BufReader<File>

Describe the solution you'd like

I would ideally like to be able to create a parquet reader with anything implementing std::io::Seek and std::io::Read.

Similarly I would like to be able to create parquet writer with anything implementing std::io::Seek and std::io::Write.

These are standard traits within the Rust ecosystem and supporting them will simplify inter-operation with other crates, and reduce cognitive load on users and contributors.

The blanket implementations on these traits will for free allow using mutable references, instead of owned values.

Describe alternatives you've considered

Preserve the current behaviour

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions