Skip to content

Reduce page metadata loading to only what is necessary for query execution in ParquetOpen #16200

Open
@adrian-thurston

Description

@adrian-thurston

Is your feature request related to a problem or challenge?

The ParquetOpen will load all page metadata for a file, on an all tasks concurrently accessing that file. This can be costly for parquet files with a large number of rows, or a large number of columns, or both.

In testing at Influx we have noticed page metadata load time taking in the order of tens of milliseconds for some customer scenarios. We have directly timed this on customer parquet files. We estimate the contribution to query time being about 83% of those times.

Some individual page metadata load times:

Write Load File Size Row Groups Columns Rows Row Group Compression Rows of Page Metadata Page Metdata Load Time Estimated Query Savings
Telegraf 110 MB 11 65 10,523,008 10.3 / 17.4 MB 67,862 9ms 6ms / 36ms
Random Datagen 283 MB 5 11 4,620,000 61.1 / 66.7 MB 5,016 0.7ms nil
Cust A 144 MB 50 26 51,521,481 2.9 / 4.5 MB 132,864 16.9ms 14.1ms / ?
Cust B 104 MB 70 19 73,158,554 1.2 / 2.7 MB 137,864 23.3ms 19.4ms / ?
Cust C 122 MB 11 199 10,530,204 10.8 / 40.3 MB 208,156 25.4ms 21.1ms / ?

Note: for the Telegraf and Random Datagen datasets we were able to measure query time savings with our prototype. For customer scenarios we can only estimate.

Describe the solution you'd like

Rather than always loading all page metadata, instead load just file metadata, prune as much as we can, then load only the page metadata needed to execute the query.

  1. Read file metadata
  2. Prune row groups by range the task is targeting (file group breakdown of the file)
  3. Prune row groups by testing predicate against row-group stats
  4. Read page metadata only for needed row-groups and columns
  5. Prune access plan using minimally loaded page metadata.

Psuedo-code looks something like this:

let metadata = ArrowReaderMetadata::load_async_no_page_metadata(&mut reader,)?;
let access_plan = create_initial_plan()?;
let mut row_groups = RowGroupAccessPlanFilter::new(access_plan);
row_groups.prune_by_range(rg_metadata, range);
row_groups.prune_by_statistics();
let rg_accessed = row_groups.rg_needed();
let cols_accessed = predicate.columns_needed();
metadata.load_async_reduced_page_metadata(&mut reader, rg_accessed, cols_accessed,)?;
access_plan = p.prune_plan_with_page_index();

In our prototype we created a sparse page-metadata array. Row-group/column indexes that we don't need were left as Index::None. Psuedo-code:

let index = metadata.row_groups().iter()
        .map(|x| {
            if self.rg_accessed.as_ref().unwrap()[x.ordinal().unwrap() as usize] {
                x.columns().iter().enumerate()
                    .map(|(index, c)| {
                        if self.col_accessed.as_ref().unwrap()[index] {
                            match c.column_index_range() {
                                Some(r) => decode_column_index()
                                None => Ok(Index::NONE),
                            }
                        } else {
                            Ok(Index::NONE)
                        }
                    })
                    .collect::<Result<Vec<_>>>()
            } else {
                x.columns().iter()
                  .map(|_| Ok(Index::NONE) )
                  .collect::<Result<Vec<_>>>()
            }
        })
        .collect::<Result<Vec<_>>>()?;

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions