Skip to content

Parallel merge in SortPreservingMergeExec after sort elimination #21381

@zhuqi-lucas

Description

@zhuqi-lucas

Is your feature request related to a problem or challenge?

When SortExec is eliminated via sort pushdown (#21182, statistics-based file reordering), SortPreservingMergeExec becomes the only merge operator, reading directly from I/O-bound DataSourceExec partitions instead of from SortExec's in-memory buffer.

Currently we insert a BufferExec to compensate, but with many partitions the single-threaded K-way merge in SPM can still become a bottleneck — it alternates between partitions, and each partition switch may stall on I/O even with buffering.

Describe the solution you'd like

When SPM reads from I/O-bound sources (no buffering SortExec), split the N input streams into groups and merge each group in parallel, then merge the intermediate results. This creates a tree of merges:

Level 1 (parallel):  merge(s1,s2), merge(s3,s4), merge(s5,s6), merge(s7,s8)
Level 2 (parallel):  merge(m1,m2), merge(m3,m4)
Level 3:             merge(m5,m6) → final output

This would be especially beneficial when:

  • Sort elimination removes the buffering SortExec, making SPM I/O-bound
  • Many partitions with I/O-bound sources
  • Large datasets where single-threaded merge becomes the bottleneck

Additional context

Suggested by @Dandandan in #21182 (comment)

Related: DuckDB's parallel k-way merge

Parent issue: #17348

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions