-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
The AVG() aggregation function in Multistage engine relied to Calcite's default logic of setting the return type to be same as the input column. For example, if the input column is INT, the return type for AVG() is INT
In our v1 engine, the return type for AVG aggregation function is always set to double. To match behavior, we created PR #10314 to override calcite's deriveAvgAggType to always cast the return type to DOUBLE.
However, instead of casting to DOUBLE always, we'd like to do something smarter where we cast to the correct floating point type depending on input.
Here's what some other systems do:
Postgres
numeric for any integer type argument, double precision for a floating-point argument, otherwise the same as the argument data type
MySQL
The SUM() and AVG() functions return a DECIMAL value for exact-value arguments (integer or DECIMAL), and a DOUBLE value for approximate-value arguments (FLOAT or DOUBLE).
Presto
avg(x) → double. (https://prestodb.io/docs/current/functions/aggregate.html)
Dremio
AVG(numeric_expression NUMERIC) → FLOAT. (https://docs.dremio.com/software/sql-reference/sql-functions/functions/AVG/)
Creating this OSS issue to get consensus on the right way to do it in Pinot.