Skip to content

[multistage] Return type for AVG() aggregation function #10318

@vvivekiyer

Description

@vvivekiyer

The AVG() aggregation function in Multistage engine relied to Calcite's default logic of setting the return type to be same as the input column. For example, if the input column is INT, the return type for AVG() is INT

In our v1 engine, the return type for AVG aggregation function is always set to double. To match behavior, we created PR #10314 to override calcite's deriveAvgAggType to always cast the return type to DOUBLE.

However, instead of casting to DOUBLE always, we'd like to do something smarter where we cast to the correct floating point type depending on input.

Here's what some other systems do:

Postgres

numeric for any integer type argument, double precision for a floating-point argument, otherwise the same as the argument data type

MySQL

The SUM() and AVG() functions return a DECIMAL value for exact-value arguments (integer or DECIMAL), and a DOUBLE value for approximate-value arguments (FLOAT or DOUBLE).

Presto

avg(x) → double. (https://prestodb.io/docs/current/functions/aggregate.html)

Dremio

AVG(numeric_expression NUMERIC) → FLOAT. (https://docs.dremio.com/software/sql-reference/sql-functions/functions/AVG/)

Creating this OSS issue to get consensus on the right way to do it in Pinot.

cc: @walterddr @siddharthteotia @somandal

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions