Skip to content

Generify ColumnReaderImpl and RecordReader #1040

@tustvold

Description

@tustvold

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently RecordReader and ColumnReaderImpl have a hard-coded assumption that they are decoding to contiguous array of values, or i16 levels. This complicates implementing #1037, #171 and potential future decode related optimisations, e.g. decoding directly to StringArray, or evaluating predicates directly, etc...

Describe the solution you'd like

Create new GenericColumnReader and GenericRecordReader which RecordReader and ColumnReaderImpl are type alias to. This preserves API compatibility whilst allowing the introduction of new type parameters. As these types need to be able to influence the buffer types, they aren't object-safe and therefore need to be generics and not simply trait objects.

All decode and buffering would be provided by these generic types, allowing them to be swapped out. This would leave ColumnReaderImpl responsible for muxing the parquet file, i.e. extracting pages from the PageReader and feeding them to the decoders. RecordReader would be responsible for delimiting semantic records, as it is today.

Describe alternatives you've considered

We could duplicate the logic in ColumnReaderImpl and RecordReader into different reader implementations, but this seems unfortunate.

Additional context

There is likely non-trivial overlap with #384 and #200 which sought to introduce generics at a different level. Unfortunately it is still coupled with the notion of contiguous value arrays, and I couldn't see a way to achieve the particular flexibility desired.

Metadata

Metadata

Assignees

Labels

parquetChanges to the parquet crate

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions