-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Support GroupsAccumulator
for Avg duration
#15748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d833b56
to
2506180
Compare
293be12
to
2f07061
Compare
Ready for review @alamb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @shruti2522. I found that the tests will be passed in the current main branch. Could you double-check if we still need this change?
# avg_duration (GroupsAccumulator) | ||
|
||
statement ok | ||
create table duration as values | ||
(arrow_cast(10, 'Duration(Second)'), arrow_cast(100, 'Duration(Millisecond)'), 'a', 1), | ||
(arrow_cast(20, 'Duration(Second)'), arrow_cast(200, 'Duration(Millisecond)'), 'a', 2), | ||
(arrow_cast(30, 'Duration(Second)'), arrow_cast(300, 'Duration(Millisecond)'), 'b', 1), | ||
(arrow_cast(40, 'Duration(Second)'), arrow_cast(400, 'Duration(Millisecond)'), 'b', 2), | ||
(arrow_cast(50, 'Duration(Second)'), arrow_cast(500, 'Duration(Millisecond)'), 'c', 1), | ||
(arrow_cast(60, 'Duration(Second)'), arrow_cast(600, 'Duration(Millisecond)'), 'c', 2); | ||
|
||
query T??I | ||
SELECT column3, avg(column1), avg(column2), column4 FROM duration GROUP BY column3, column4 ORDER BY column3, column4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the following tests will be passed after #15468 in the current main branch.
I think the groups accumulator will result in faster performance, not new functionality. @shruti2522 maybe you can verify this with |
Thanks for the explanation. If it's a performance improvement, it's better to run some benchmark or add the benchmark case in |
Got it @alamb @goldmedal, will test and share results soon. |
I wrote up a ticket about how to make a benchmark: |
Thanks, @alamb. I've been a bit caught up with semester exams lately, but I’ll get it done by tomorrow. |
Thank you so much! I totally re: school (which should take top priority for sure) |
@logan-keede just finished up a benchmark for this PR here; I will merge this PR up from main and run the benchmarks on it to see if we can see the expected improvement |
🤖 |
🤖: Benchmark completed Details
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is now looking good thanks to @shruti2522 @goldmedal and @logan-keede
I merged up from main to resolve a conflict
I also added support for Duration
to the fuzz tester in a separate PR to increase coverage:
is this a cause of concern? |
Given Q4 seems to be unrelated to avg: SELECT COUNT(DISTINCT "UserID") FROM hits; I am assuming this was noise in the measurements |
🚀 |
Thanks again @logan-keede and @shruti2522 -- this is a very nice improvement I think |
Which issue does this PR close?
AVG
for durations #15458 .Rationale for this change
What changes are included in this PR?
Implement
GroupsAccumulator
for avg duration calculationAre these changes tested?
Yes
Are there any user-facing changes?
Yes