Skip to content

Refactor: extract sort pushdown logic from FileScanConfig into separate module #21433

@zhuqi-lucas

Description

@zhuqi-lucas

Is your feature request related to a problem or challenge?

FileScanConfig in datafusion/datasource/src/file_scan_config.rs has grown large after the sort pushdown optimization (#21182) added statistics-based file sorting, non-overlapping validation, and NULL handling logic.

As noted by @alamb in #21182 (comment):

As a follow on PR it might be nice to figure out how to move some of this code out of FileScanConfig and into some other smaller module

Describe the solution you'd like

Extract sort pushdown related code from FileScanConfig into a dedicated module, e.g. datafusion/datasource/src/sort_pushdown.rs:

  • try_pushdown_sort()
  • rebuild_with_source()
  • try_sort_file_groups_by_statistics()
  • sort_files_within_groups_by_statistics()
  • any_file_has_nulls_in_sort_columns()
  • Related helper functions and types (SortedFileGroups, etc.)

This is a pure refactor — no behavior changes.

Related issues:

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions