-
Notifications
You must be signed in to change notification settings - Fork 1.8k
fix: output_ordering converted to Vec<Option<LexOrdering>> #17439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@crepererum can you please review this PR (as you filed the request for #17354) |
|
@alamb , can you please trigger the test workflows again? Just updated the branch |
crepererum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a start, but the actual bug of "we loose partitions during projections" was now moved out of get_projected_output_ordering into all these flatten calls. To actually fix this, I think we need to fix the return type of project as well.
| } | ||
|
|
||
| /// Set the output ordering of the files | ||
| pub fn with_output_ordering(mut self, output_ordering: Vec<LexOrdering>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we eventually fix this API as well so that the API setter is the same as the internal state? If you wanna make the transition for existing users smoother, deprecate this method here and add a new one a la:
pub fn with_output_ordering_opt(mut self, output_ordering: Vec<LexOrdering>) -> Self { ... }There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that makes sense, I'll refactor this function and related references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also modify the type of projected_output_orderings here?
| projected_output_ordering: Vec<LexOrdering>, |
This is a result of modifying the return type of the following functions to Result<Vec<Option<LexOrdering>>
pub fn create_ordering( fn try_create_output_ordering(&self) -> Result<Vec<LexOrdering>> {
This is required since in
| let output_ordering = self.try_create_output_ordering()?; |
try_create_output_ordering function.
There are two solutions:
- Either modify the return types of
create_output_orderingand other functions that are affected by it - Preserve the current return type as
Vec<LexOrdering>and just wrap each item in the vec into Some() before sending as args to with_output_ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the projected version needs to be adjusted too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood! I'll make the necessary changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should skip modifying the type of expr to Option since at this stage having None seems to be a little useless.
#[derive(Debug, Clone)]
pub struct SortExec {
/// Input schema
pub(crate) input: Arc<dyn ExecutionPlan>,
/// Sort expressions
expr: LexOrdering,
/// Containing all metrics set created during sort
metrics_set: ExecutionPlanMetricsSet,
/// Preserve partitions of input plan. If false, the input partitions
/// will be sorted and merged into a single output partition.
preserve_partitioning: bool,
/// Fetch highest/lowest n results
fetch: Option<usize>,
/// Normalized common sort prefix between the input and the sort expressions (only used with fetch)
common_sort_prefix: Vec<PhysicalSortExpr>,
/// Cache holding plan properties like equivalences, output partitioning etc.
cache: PlanProperties,
/// Filter matching the state of the sort for dynamic filter pushdown.
/// If `fetch` is `Some`, this will also be set and a TopK operator may be used.
/// If `fetch` is `None`, this will be `None`.
filter: Option<Arc<RwLock<TopKDynamicFilters>>>,
}| DisplayFormatType::Default | DisplayFormatType::Verbose => { | ||
| let schema = self.projected_schema(); | ||
| let orderings = get_projected_output_ordering(self, &schema); | ||
| // Remove None from projected output ordering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should display the None cases too, otherwise this is going to be pretty confusing. Imagine a projection that leads to Some, None, Some and you now gonna display 2 orderings (for 3 partitions). From that you cannot tell if it was None, Some, Some, Some, None, Some, or Some, Some, None.
| /// Project the schema, constraints, and the statistics on the given column indices | ||
| pub fn project(&self) -> (SchemaRef, Constraints, Statistics, Vec<LexOrdering>) { | ||
| if self.projection.is_none() && self.table_partition_cols.is_empty() { | ||
| let output_ordering: Vec<LexOrdering> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that this is correct, with the same argument as for the display code. I think the return type here also needs to be Vec<Option<...>>.
| } | ||
|
|
||
| display_orderings(f, &orderings)?; | ||
| let flattened_orderings: Vec<LexOrdering> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as for the other display code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. I'll refactor the display function and its relevant calls.
Which issue does this PR close?
FileScanConfig::output_orderingmust be vector of optionals #17354What changes are included in this PR?
FileScanConfigtoVec<Option<LexOrdering>>Vec<Option>Vec<LexOrdering>fromVec<Option<LexOrdering>>(This might need some discussion/additional reviews)Are these changes tested?
Changes were mostly minor refactors.