-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The current ParquetMetaDataReader is a wonder of software engineering thanks to @etseidl. However, it is somewhat complicated to use as it has both async and sync methods as well as keeps state internally in a non obvious way -- for example do you call try_parse or parse_and_finish? Or how os load_via_suffix_and_finish related?
Compared to what came before it, ParquetMetaDataReader is an amazing improvement, but I think we could do better.
I ran into this when I discovered that Metadata is needed when implementing a push decoder for Parquet:
Basically, I want a way to parse the metadata without ALSO doing the IO at the same time
Describe the solution you'd like
If we want to truly separate IO and CPU we also need a way to decode the metadata without explicit IO, and hence this PR that provides a way to decode metadata "push style" where it tells you what bytes are needed. It follows the same API as the parquet push decoder
Describe alternatives you've considered
Additional context