Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In high-latency environments, like in WebAssembly in users' browsers, minimizing the number of sequential requests can be a significant performance improvement. Now that #5222 has been merged, object_store
now has the ability to fetch a suffix byte range of files. It would be great to be able to integrate this with parquet
to reduce the number of individual requests required.
Describe the solution you'd like
It seems that parquet::arrow::async_reader::MetadataLoader::load
can be refactored to use an initial suffix request instead of needing to know the file size.
Or, perhaps, there should be a new MetadataLoader::load_suffix
method, so that implementations can choose to use load
if they already know the file size, and use load_suffix
if they don't.
Related to this, the ParquetObjectReader::new
API requires an ObjectMeta
, which requires knowing the file size. It would be great to be able to construct ParquetObjectReader
with only the store
and the object_store::path::Path
. (I'm trying to construct ParquetObjectReader
with a fake file length, passing my own ArrowReaderMetadata
#5583, but I haven't figured out if that works yet)
Describe alternatives you've considered
Make an extra HEAD
request instead of using suffix requests.
Additional context