-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add window expression part 1 - logical and physical planning, structure, to/from proto, and explain, for empty over clause only #334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+1,498
−102
Merged
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
1d3b076
add window expr
a0b7526
fix unused imports
5c4d92d
fix clippy
3ee87aa
fix unit test
f70c739
Update datafusion/src/logical_plan/builder.rs
jimexist 831c069
Update datafusion/src/logical_plan/builder.rs
jimexist 0cbca53
Update datafusion/src/physical_plan/window_functions.rs
jimexist abf08cd
Update datafusion/src/physical_plan/window_functions.rs
jimexist 8b486d5
adding more built-in functions
0d2a214
adding filter by todo
a1eae86
enrich unit test
f5e64de
update
c36c04a
add more tests
4e792e1
fix test
880b94f
fix unit test
bc2271d
fix error
1ecae8f
fix unit test
5d96e52
fix unit test
bb57c76
use upper case
2af2a27
fix unit test
6941151
comment out test
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked whether this makes sense to reuse aggregate functions for window expressions - I think it does! E.g. PostgreSQL also says:
https://www.postgresql.org/docs/9.3/functions-window.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes in general three types of things can be used:
for both 1. and 2. they are not order sensitive, but for 3 we'll have to take sort into account
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO only the second time we'll really need to sort by
c3There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also fun thing to notice:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is very useful for analytics, e.g. if you want to know (in an employee table with name, department, and salary) the list of employees in each department with salary above average.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe
count(..) over order by ..also needs to be sorted, it will do a count over the window, which means a running count (over sorted rows) by default.But yeah very useful for analytics indeed 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment above was more about re-using the same functions over here - as I thought we might not want to support every aggregation function here too. But for me it sounds like a good idea to reuse them. Maybe @alamb has some ideas about it as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SQL is confusing in this area -- as @jimexist says, all "normal" aggregate functions (e.g. sum, count, etc) are also valid window functions, but the reverse is not true. You can't use window functions (e.g. LAG, LEAD, etc) outside of a window clause.
Thus I think representing window functions as a new type of function, as this PR does, makes the most sense. They are different enough (e.g. require information on the incoming windows) that trying to wrangle them into the same structures as normal aggregates seems like it will get messy. Long term I would expect we have a UDWF (user defined window function) api as well.
Ideally the physical implementation for
sum/count/ etc can be mostly reused but in the plans I think they are different enough to warrant different plan structures.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Indeed.