-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat(datafusion-functions-aggregate): add support for lists and other nested types in min and max #15857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
min
and max
min
and max
} else if let DataType::List(field) = col_type { | ||
extract_window_frame_target_type(field.data_type()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure what am I doing here. This seems necessary for window functions over min/max aggregations on lists to work, and does not seem to be breaking other things, but maybe more experienced DF folks can throw some light here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be related to converting window frames when they have RANGE
rather than ROWS
-- so maybe we need some more tests. I'll comment below
Is this related to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @gabotechs -- I think just a few more tests are needed and this one will eb good to go.
FYI @rluvaton
} else if let DataType::List(field) = col_type { | ||
extract_window_frame_target_type(field.data_type()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be related to converting window frames when they have RANGE
rather than ROWS
-- so maybe we need some more tests. I'll comment below
/// Accumulator for min/max of lists | ||
/// this accumulator is using arrow-row as the internal representation and a way for comparing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Accumulator for min/max of lists | |
/// this accumulator is using arrow-row as the internal representation and a way for comparing | |
/// Accumulator for min/max of lists | |
/// | |
/// This accumulator uses arrow-row as the internal representation and a way | |
/// for comparing rows. |
fn value(&self) -> Option<&OwnedRow>; | ||
} | ||
|
||
#[derive(Debug, Default)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[derive(Debug, Default)] | |
/// Similarly to [`GenericMinMaxAccumulator`], implements sliding min/max for Lists | |
/// using the Row converter | |
#[derive(Debug, Default)] |
I think the overlap is that the approach in this PR (to use Row format) could have also been used in #15667. As a follow on / if someone cares about performance, we could probably implement a much more efficient (though complex0 list comparison routine that compares element by element rather than copying the entire list element into a Row format |
Here's the new PR: |
Which issue does this PR close?
Rationale for this change
Resume the work started by @rluvaton in #13991, adding some tests and adding a small fix in the middle.
What changes are included in this PR?
Add support for performing
MIN
andMAX
aggregations over lists, building on top of #13991 maintaining the original commit history, and only adding commits on top of it.This PR also includes a commit for fixing ScalarValue::List comparisons that could be shipped independently:
Are these changes tested?
Yes, by new sqllogictests
Are there any user-facing changes?
Yes, users can now perform
MIN
andMAX
operations over lists