Description
Is your feature request related to a problem or challenge?
The ParquetOpen will load all page metadata for a file, on an all tasks concurrently accessing that file. This can be costly for parquet files with a large number of rows, or a large number of columns, or both.
In testing at Influx we have noticed page metadata load time taking in the order of tens of milliseconds for some customer scenarios. We have directly timed this on customer parquet files. We estimate the contribution to query time being about 83% of those times.
Some individual page metadata load times:
Write Load | File Size | Row Groups | Columns | Rows | Row Group Compression | Rows of Page Metadata | Page Metdata Load Time | Estimated Query Savings |
---|---|---|---|---|---|---|---|---|
Telegraf | 110 MB | 11 | 65 | 10,523,008 | 10.3 / 17.4 MB | 67,862 | 9ms | 6ms / 36ms |
Random Datagen | 283 MB | 5 | 11 | 4,620,000 | 61.1 / 66.7 MB | 5,016 | 0.7ms | nil |
Cust A | 144 MB | 50 | 26 | 51,521,481 | 2.9 / 4.5 MB | 132,864 | 16.9ms | 14.1ms / ? |
Cust B | 104 MB | 70 | 19 | 73,158,554 | 1.2 / 2.7 MB | 137,864 | 23.3ms | 19.4ms / ? |
Cust C | 122 MB | 11 | 199 | 10,530,204 | 10.8 / 40.3 MB | 208,156 | 25.4ms | 21.1ms / ? |
Note: for the Telegraf and Random Datagen datasets we were able to measure query time savings with our prototype. For customer scenarios we can only estimate.
Describe the solution you'd like
Rather than always loading all page metadata, instead load just file metadata, prune as much as we can, then load only the page metadata needed to execute the query.
- Read file metadata
- Prune row groups by range the task is targeting (file group breakdown of the file)
- Prune row groups by testing predicate against row-group stats
- Read page metadata only for needed row-groups and columns
- Prune access plan using minimally loaded page metadata.
Psuedo-code looks something like this:
let metadata = ArrowReaderMetadata::load_async_no_page_metadata(&mut reader, …)?;
let access_plan = create_initial_plan( … )?;
let mut row_groups = RowGroupAccessPlanFilter::new(access_plan);
row_groups.prune_by_range(rg_metadata, range);
row_groups.prune_by_statistics( … );
let rg_accessed = row_groups.rg_needed();
let cols_accessed = predicate.columns_needed();
metadata.load_async_reduced_page_metadata(&mut reader, rg_accessed, cols_accessed, …)?;
access_plan = p.prune_plan_with_page_index( … );
In our prototype we created a sparse page-metadata array. Row-group/column indexes that we don't need were left as Index::None
. Psuedo-code:
let index = metadata.row_groups().iter()
.map(|x| {
if self.rg_accessed.as_ref().unwrap()[x.ordinal().unwrap() as usize] {
x.columns().iter().enumerate()
.map(|(index, c)| {
if self.col_accessed.as_ref().unwrap()[index] {
match c.column_index_range() {
Some(r) => decode_column_index( … )
None => Ok(Index::NONE),
}
} else {
Ok(Index::NONE)
}
})
.collect::<Result<Vec<_>>>()
} else {
x.columns().iter()
.map(|_| Ok(Index::NONE) )
.collect::<Result<Vec<_>>>()
}
})
.collect::<Result<Vec<_>>>()?;
Describe alternatives you've considered
No response
Additional context
No response