Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
DataFusion does not avg(distinct ..)
:
> select avg(distinct a), avg(distinct b) from values(1, 2) t(a, b);
Execution error: avg(DISTINCT) aggregations are not available
Somewhat confusingly DataFusion does appear to support a single distinct avg
> select avg(distinct a) from values(1, 2) t(a, b);
+-------------------+
| avg(DISTINCT t.a) |
+-------------------+
| 1.0 |
+-------------------+
1 row(s) fetched.
Elapsed 0.015 seconds.
But it does this by rewriting effectively rewriting the query select avg(distinct a)
to select avg(a) from (select distinct a ...)
in the SingleDistinctToGroupBy rule
Describe the solution you'd like
Support multile distinct avg aggregates:
datafusion/datafusion/functions-aggregate/src/average.rs
Lines 116 to 118 in 2d30334
Describe alternatives you've considered
Basically we could follow the model of COUNT(distinct ...)
aggregates:
In this case, there is a separate accumulator type that is implemented for count distinct
Additional context
Add any other context or screenshots about the feature request here.