Skip to content

Support GroupsAccumulator for Avg duration #15748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

shruti2522
Copy link
Contributor

@shruti2522 shruti2522 commented Apr 17, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Implement GroupsAccumulator for avg duration calculation

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Apr 17, 2025
@shruti2522 shruti2522 marked this pull request as ready for review April 20, 2025 22:52
@shruti2522
Copy link
Contributor Author

Ready for review @alamb

Copy link
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shruti2522. I found that the tests will be passed in the current main branch. Could you double-check if we still need this change?

Comment on lines 5039 to 5051
# avg_duration (GroupsAccumulator)

statement ok
create table duration as values
(arrow_cast(10, 'Duration(Second)'), arrow_cast(100, 'Duration(Millisecond)'), 'a', 1),
(arrow_cast(20, 'Duration(Second)'), arrow_cast(200, 'Duration(Millisecond)'), 'a', 2),
(arrow_cast(30, 'Duration(Second)'), arrow_cast(300, 'Duration(Millisecond)'), 'b', 1),
(arrow_cast(40, 'Duration(Second)'), arrow_cast(400, 'Duration(Millisecond)'), 'b', 2),
(arrow_cast(50, 'Duration(Second)'), arrow_cast(500, 'Duration(Millisecond)'), 'c', 1),
(arrow_cast(60, 'Duration(Second)'), arrow_cast(600, 'Duration(Millisecond)'), 'c', 2);

query T??I
SELECT column3, avg(column1), avg(column2), column4 FROM duration GROUP BY column3, column4 ORDER BY column3, column4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the following tests will be passed after #15468 in the current main branch.

@alamb
Copy link
Contributor

alamb commented Apr 28, 2025

I think the groups accumulator will result in faster performance, not new functionality.

@shruti2522 maybe you can verify this with datafusion-cli and using generate_series using a duration column

@goldmedal
Copy link
Contributor

I think the groups accumulator will result in faster performance, not new functionality.

Thanks for the explanation. If it's a performance improvement, it's better to run some benchmark or add the benchmark case in datafusion/functions-aggregate/benches to prove the result.

@shruti2522
Copy link
Contributor Author

Got it @alamb @goldmedal, will test and share results soon.

@alamb
Copy link
Contributor

alamb commented May 5, 2025

I wrote up a ticket about how to make a benchmark:

@shruti2522
Copy link
Contributor Author

I wrote up a ticket about how to make a benchmark:

Thanks, @alamb. I've been a bit caught up with semester exams lately, but I’ll get it done by tomorrow.

@alamb
Copy link
Contributor

alamb commented May 5, 2025

I wrote up a ticket about how to make a benchmark:

Thanks, @alamb. I've been a bit caught up with semester exams lately, but I’ll get it done by tomorrow.

Thank you so much! I totally re: school (which should take top priority for sure)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support AVG for durations
3 participants