Skip to content

Sort pushdown: reorder row groups by statistics within each file #21317

@zhuqi-lucas

Description

@zhuqi-lucas

Is your feature request related to a problem or challenge?

Currently sort pushdown reorders files by min/max statistics to achieve sort elimination. But within each file, row groups are read in their original order (or reversed via reverse_row_groups). If row groups within a file have non-overlapping ranges, reordering them by statistics could further optimize scan order.

Describe the solution you'd like

Pass the desired sort order into the FileOpener and have it re-sort row groups based on their min/max statistics to match the scan's desired ordering.

For example, a file with 4 row groups:

RG1: min=100, max=200
RG2: min=1,   max=50
RG3: min=300, max=400
RG4: min=51,  max=99

For ORDER BY col ASC LIMIT 10, reordering to [RG2, RG4, RG1, RG3] lets TopK find the smallest values first, which provides two levels of optimization:

  1. Row-level filtering (immediate benefit): TopK sets a tight dynamic filter threshold after reading RG2. For subsequent row groups, the dynamic filter acts as a row-level filter — the parquet reader uses page index to skip non-matching pages and avoids decoding non-sort columns for filtered rows. This significantly reduces I/O for wide tables.

  2. Row-group-level skipping (follow-up): Currently row group selection is done upfront before any data is read, so the dynamic filter from TopK cannot prune already-selected row groups. A follow-up optimization could re-evaluate the dynamic filter between row groups, skipping entire row groups when their min statistics exceed the TopK threshold. This would be especially effective with morselized scans where TopK could terminate after a single row group.

Additional context

This was suggested by @adriangb in #21182 (comment):

I do think there's one more trick we could have up our sleeves: instead of only reversing row group orders we could pass the desired sort order into the opener and have it re-sort the row groups based on stats to try to match the scan's desired ordering. This might be especially effective once we have morselized scans since we could terminate after a single row group for TopK queries.

Related code: ParquetSource::try_pushdown_sort — currently only supports reverse scan (reverse_row_groups=true), could be extended to do statistics-based row group reordering.

Parent issue: #17348

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions