Skip to content

Use parquet crate for decoding Parquet data into Arrow arrays #1040

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Comet has native code for decoding Parquet structures into Arrow arrays. This issue is for discussing delegating to the parquet crate instead for these operations.

The benefits of this approach include:

  • Support for complex types. The parquet crate already supports reading maps and structs. We could implement the same support in the Comet native code but it is probably a lot of work
  • Support for StringView and benefitting from related performance optimizations (see [1] and [2] for details)
  • Benefit from ongoing optimization work and active community
  • Reduce maintenance efforts in Comet

Possible downsides of this approach:

  • Lose the performance benefit of re-using mutable buffers? (although this also comes with a maintenance cost)

[1] https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-1/
[2] https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-2/

Describe the potential solution

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions