Skip to content

[multistage] Fix return type for AVG aggregate function#10314

Merged
siddharthteotia merged 2 commits intoapache:masterfrom
vvivekiyer:multi-stage-avg
Feb 22, 2023
Merged

[multistage] Fix return type for AVG aggregate function#10314
siddharthteotia merged 2 commits intoapache:masterfrom
vvivekiyer:multi-stage-avg

Conversation

@vvivekiyer
Copy link
Contributor

@vvivekiyer vvivekiyer commented Feb 21, 2023

Label = bugfix

Currently, for AVG aggregation function, Calcite determines the return type based on the input argument. This means that if the column if of INT type, the AVG return type would also be an integer.

The fix added in this PR returns a DOUBLE datatype for AVG.

Without Multistage
With Multistage

@codecov-commenter
Copy link

codecov-commenter commented Feb 21, 2023

Codecov Report

❗ No coverage uploaded for pull request base (master@c0f6135). Click here to learn what that means.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##             master   #10314   +/-   ##
=========================================
  Coverage          ?   70.27%           
  Complexity        ?     5867           
=========================================
  Files             ?     2027           
  Lines             ?   110066           
  Branches          ?    16716           
=========================================
  Hits              ?    77349           
  Misses            ?    27283           
  Partials          ?     5434           
Flag Coverage Δ
integration1 24.46% <0.00%> (?)
integration2 24.47% <0.00%> (?)
unittests1 67.68% <100.00%> (?)
unittests2 13.73% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...n/java/org/apache/pinot/query/type/TypeSystem.java 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@vvivekiyer vvivekiyer marked this pull request as ready for review February 21, 2023 16:04
}

@Override
public RelDataType deriveAvgAggType(RelDataTypeFactory typeFactory,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! You will need to make some changes to the WindowFunctionsPlans.json to pick up the casting in the plans (I've used AVG quite extensively in that file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved all test issues.

}
},
"queries": [
{
Copy link
Contributor Author

@vvivekiyer vvivekiyer Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query on H2 database is returning an INT. Couldn't find a better workaround other than removing this test for H2. Would appreciate inputs here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we comment out this test for now instead of removing ?

Will SELECT AVG(CAST int_col AS DOUBLE) FROM FOO work ?

@vvivekiyer vvivekiyer requested a review from somandal February 22, 2023 06:19
@siddharthteotia siddharthteotia added the multi-stage Related to the multi-stage query engine label Feb 22, 2023
@siddharthteotia
Copy link
Contributor

Thanks for the fix

@siddharthteotia siddharthteotia merged commit 6aea80c into apache:master Feb 22, 2023
Comment on lines +60 to +61
return typeFactory.createTypeWithNullability(
typeFactory.createSqlType(SqlTypeName.DOUBLE), false);
Copy link
Contributor

@walterddr walterddr Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right way to fix the issue. we should rely on Calcite's derived type system to figure out what the return type should be. this way all AVG results are returning as double which is going to cause more issue than it resolves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on a coarse read, we should check the input decimal precision and try to do upcast to the appropriate floating point type. (e.g. the scale and precision

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I resorted to using double datatype for AVG because that's what we do in V1 engine.

We could maybe do what postgres does. Thoughts?

numeric for any integer type argument, double precision for a floating-point argument, otherwise the same as the argument data type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on offline discussion, created oss issue #10318 to determine the logic for AVG return type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants