Skip to content

Support AVG for durations #15458

Open
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Specifically given this table

> create or replace table t(user_id int, ts timestamp) as values (1, '2021-01-01'), (2, '2021-01-02'), (1, '2021-01-03'), (2, '2021-01-04');
0 row(s) fetched.
Elapsed 0.003 seconds.

> select * from t;
+---------+---------------------+
| user_id | ts                  |
+---------+---------------------+
| 1       | 2021-01-01T00:00:00 |
| 2       | 2021-01-02T00:00:00 |
| 1       | 2021-01-03T00:00:00 |
| 2       | 2021-01-04T00:00:00 |
+---------+---------------------+
4 row(s) fetched.
Elapsed 0.006 seconds.

I would like to be able to compute the average distance for the timestamps. However it doesn't work today:

> select avg('2021-01-01'::timestamp -  ts) from t;
Error during planning: Execution error: Function 'avg' user-defined coercion failed with "Error during planning: The function \"avg\" does not support inputs of type Duration(Nanosecond)." No function matches the given name and argument types 'avg(Duration(Nanosecond))'. You might need to add explicit type casts.
	Candidate functions:
	avg(UserDefined)
> select user_id, avg('2021-01-01'::timestamp -  ts) from t GROUP BY user_id;
Error during planning: Execution error: Function 'avg' user-defined coercion failed with "Error during planning: The function \"avg\" does not support inputs of type Duration(Nanosecond)." No function matches the given name and argument types 'avg(Duration(Nanosecond))'. You might need to add explicit type casts.
	Candidate functions:
	avg(UserDefined)

Describe the solution you'd like

I would like the two queries above to work

Here is how they work in postgres

postgres=# create table t(user_id int, ts timestamp);
CREATE TABLE
postgres=# insert into t  values (1, '2021-01-01'), (2, '2021-01-02'), (1, '2021-01-03'), (2, '2021-01-04');
INSERT 0 4
postgres=# select avg('2021-01-01'::timestamp -  ts) from t;
        avg
-------------------
 -1 days -12:00:00
(1 row)

postgres=# select user_id, avg('2021-01-01'::timestamp -  ts) from t GROUP BY user_id;
 user_id |   avg
---------+---------
       2 | -2 days
       1 | -1 days
(2 rows)

Describe alternatives you've considered

I think we can follow the model we did for Min and Max

Additional context

I think this is a good first issue as it should involve instantiating some new types and writing tests. I don't think we'll need to implement any new accumulators

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions