-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelog
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
- Related to [Epic] Parquet Reader Improvement Plan / Proposal - July 2025 #8000
- Similar to Decouple IO and CPU operations in the Parquet Reader (push decoder) #7983, but for the metadata reader
The current ParquetMetadataReader
intermixes three things:
- The state machine for decoding parquet metadata (footer, then metadata, then (optional) indexes)
- orchestrating IO (aka calling read, etc)
- Decoding thrift encoded byte into objects
This makes it almost impossible to add features like "only decode a subset of the columns in the ColumnIndex" and other potentially advanced usecases
Describe the solution you'd like
Now that we have a "push" style API for metadata decoding that avoids IO, I would like to separate out these three parts so that we can add better features
Describe alternatives you've considered
Additional context
Metadata
Metadata
Assignees
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelog