Open
Description
Is your feature request related to a problem or challenge?
- @goldmedal points out on Support Custom Function Registration with Catalog and Schema #15363 that AVG for duration isn't supported
Specifically given this table
> create or replace table t(user_id int, ts timestamp) as values (1, '2021-01-01'), (2, '2021-01-02'), (1, '2021-01-03'), (2, '2021-01-04');
0 row(s) fetched.
Elapsed 0.003 seconds.
> select * from t;
+---------+---------------------+
| user_id | ts |
+---------+---------------------+
| 1 | 2021-01-01T00:00:00 |
| 2 | 2021-01-02T00:00:00 |
| 1 | 2021-01-03T00:00:00 |
| 2 | 2021-01-04T00:00:00 |
+---------+---------------------+
4 row(s) fetched.
Elapsed 0.006 seconds.
I would like to be able to compute the average distance for the timestamps. However it doesn't work today:
> select avg('2021-01-01'::timestamp - ts) from t;
Error during planning: Execution error: Function 'avg' user-defined coercion failed with "Error during planning: The function \"avg\" does not support inputs of type Duration(Nanosecond)." No function matches the given name and argument types 'avg(Duration(Nanosecond))'. You might need to add explicit type casts.
Candidate functions:
avg(UserDefined)
> select user_id, avg('2021-01-01'::timestamp - ts) from t GROUP BY user_id;
Error during planning: Execution error: Function 'avg' user-defined coercion failed with "Error during planning: The function \"avg\" does not support inputs of type Duration(Nanosecond)." No function matches the given name and argument types 'avg(Duration(Nanosecond))'. You might need to add explicit type casts.
Candidate functions:
avg(UserDefined)
Describe the solution you'd like
I would like the two queries above to work
Here is how they work in postgres
postgres=# create table t(user_id int, ts timestamp);
CREATE TABLE
postgres=# insert into t values (1, '2021-01-01'), (2, '2021-01-02'), (1, '2021-01-03'), (2, '2021-01-04');
INSERT 0 4
postgres=# select avg('2021-01-01'::timestamp - ts) from t;
avg
-------------------
-1 days -12:00:00
(1 row)
postgres=# select user_id, avg('2021-01-01'::timestamp - ts) from t GROUP BY user_id;
user_id | avg
---------+---------
2 | -2 days
1 | -1 days
(2 rows)
Describe alternatives you've considered
I think we can follow the model we did for Min
and Max
- Add
Accumulator
like @svranesevic did in Support Duration in min/max agg functions #15310 - Add
GroupsAccumulator
like @shruti2522 did in Implement GroupsAccumulator for min/max Duration #15322
Additional context
I think this is a good first issue as it should involve instantiating some new types and writing tests. I don't think we'll need to implement any new accumulators