Skip to content

[Parquet] ParquetMetadataReader decodes too much metadata under point-get scenerio #8751

@mapleFU

Description

@mapleFU

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

When PageIndexPolicy is not Skip, the ParquetMetadataReader will read the whole page index and decoding them. The problem is that:

  1. For large scan with all columns and all row-groups, this works well, decode metadata will be lightweight
  2. For point-get with few columns, current it will read all column indexes and page index. Decoding them would be heavyweight sometimes

Describe the solution you'd like

A interface for row_group selection and only decoding them?

Describe alternatives you've considered

I don't know, maybe I can do io myself?

Additional context

No

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions