Skip to content

Conversation

@destrex271
Copy link

@destrex271 destrex271 commented Sep 5, 2025

Which issue does this PR close?

What changes are included in this PR?

  • Updated the type of output_ordering in FileScanConfig to Vec<Option<LexOrdering>>
  • Updated the get_projected_output_ordering function in file_scan_config.rs to return Vec<Option>
  • Handle output generated by get_projected_output_ordering in various places by flattening the result and reconstructing Vec<LexOrdering> from Vec<Option<LexOrdering>> (This might need some discussion/additional reviews)

Are these changes tested?

Changes were mostly minor refactors.

@github-actions github-actions bot added proto Related to proto crate datasource Changes to the datasource crate labels Sep 5, 2025
@destrex271 destrex271 marked this pull request as ready for review September 5, 2025 10:02
@alamb alamb requested a review from crepererum September 5, 2025 18:21
@alamb
Copy link
Contributor

alamb commented Sep 5, 2025

@crepererum can you please review this PR (as you filed the request for #17354)

@alamb alamb added the api change Changes the API exposed to users of the crate label Sep 5, 2025
@destrex271
Copy link
Author

@alamb , can you please trigger the test workflows again? Just updated the branch

Copy link
Contributor

@crepererum crepererum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a start, but the actual bug of "we loose partitions during projections" was now moved out of get_projected_output_ordering into all these flatten calls. To actually fix this, I think we need to fix the return type of project as well.

}

/// Set the output ordering of the files
pub fn with_output_ordering(mut self, output_ordering: Vec<LexOrdering>) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we eventually fix this API as well so that the API setter is the same as the internal state? If you wanna make the transition for existing users smoother, deprecate this method here and add a new one a la:

pub fn with_output_ordering_opt(mut self, output_ordering: Vec<LexOrdering>) -> Self { ... }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that makes sense, I'll refactor this function and related references.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also modify the type of projected_output_orderings here?

projected_output_ordering: Vec<LexOrdering>,

This is a result of modifying the return type of the following functions to Result<Vec<Option<LexOrdering>>

  1. fn try_create_output_ordering(&self) -> Result<Vec<LexOrdering>> {

This is required since in

let output_ordering = self.try_create_output_ordering()?;
we are creating the output orderings using the try_create_output_ordering function.

There are two solutions:

  1. Either modify the return types of create_output_ordering and other functions that are affected by it
  2. Preserve the current return type as Vec<LexOrdering> and just wrap each item in the vec into Some() before sending as args to with_output_ordering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the projected version needs to be adjusted too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood! I'll make the necessary changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crepererum

I guess we should skip modifying the type of expr to Option since at this stage having None seems to be a little useless.

#[derive(Debug, Clone)]
pub struct SortExec {
    /// Input schema
    pub(crate) input: Arc<dyn ExecutionPlan>,
    /// Sort expressions
    expr: LexOrdering,
    /// Containing all metrics set created during sort
    metrics_set: ExecutionPlanMetricsSet,
    /// Preserve partitions of input plan. If false, the input partitions
    /// will be sorted and merged into a single output partition.
    preserve_partitioning: bool,
    /// Fetch highest/lowest n results
    fetch: Option<usize>,
    /// Normalized common sort prefix between the input and the sort expressions (only used with fetch)
    common_sort_prefix: Vec<PhysicalSortExpr>,
    /// Cache holding plan properties like equivalences, output partitioning etc.
    cache: PlanProperties,
    /// Filter matching the state of the sort for dynamic filter pushdown.
    /// If `fetch` is `Some`, this will also be set and a TopK operator may be used.
    /// If `fetch` is `None`, this will be `None`.
    filter: Option<Arc<RwLock<TopKDynamicFilters>>>,
}

DisplayFormatType::Default | DisplayFormatType::Verbose => {
let schema = self.projected_schema();
let orderings = get_projected_output_ordering(self, &schema);
// Remove None from projected output ordering
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should display the None cases too, otherwise this is going to be pretty confusing. Imagine a projection that leads to Some, None, Some and you now gonna display 2 orderings (for 3 partitions). From that you cannot tell if it was None, Some, Some, Some, None, Some, or Some, Some, None.

/// Project the schema, constraints, and the statistics on the given column indices
pub fn project(&self) -> (SchemaRef, Constraints, Statistics, Vec<LexOrdering>) {
if self.projection.is_none() && self.table_partition_cols.is_empty() {
let output_ordering: Vec<LexOrdering> =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that this is correct, with the same argument as for the display code. I think the return type here also needs to be Vec<Option<...>>.

}

display_orderings(f, &orderings)?;
let flattened_orderings: Vec<LexOrdering> =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as for the other display code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I'll refactor the display function and its relevant calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate datasource Changes to the datasource crate proto Related to proto crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FileScanConfig::output_ordering must be vector of optionals

3 participants