Skip to content

Handle Agg Functions with Literal Args When Used with a Union#13561

Merged
Jackie-Jiang merged 2 commits intoapache:masterfrom
ankitsultana:percentile-npe-bug
Jul 10, 2024
Merged

Handle Agg Functions with Literal Args When Used with a Union#13561
Jackie-Jiang merged 2 commits intoapache:masterfrom
ankitsultana:percentile-npe-bug

Conversation

@ankitsultana
Copy link
Contributor

@ankitsultana ankitsultana commented Jul 8, 2024

Attempts to fix #13305.

This is definitely not the most scientific fix (it's more like an empirical approach).

The issue was that aggregate node may not necessarily be on top of a project node. In our case, it was on top of a Union node, so this PR extends the aggregate exchange rule to handle that.

Test Plan

This query was failing before with the error below, but it works now:

SET useMultistageEngine=true;

with
  teamOne as (
  	select playerName, teamID, percentile(runs, 50) as sum_of_runs from baseballStats where teamID = 'SFN' group by playerName, teamID
  ),
  teamTwo as (
	select playerName, teamID, percentile(runs, 50) as sum_of_runs from baseballStats where teamID = 'CHN' group by playerName, teamID
  ),
  all as (
	select playerName, teamID, sum_of_runs from teamOne union all select playerName, teamID, sum_of_runs from teamTwo
  )
  select 
    /*+ aggOptions(is_skip_leaf_stage_group_by='true') */
    teamID, 
	-- sum(sum_of_runs) 
	percentile(sum_of_runs, 50) 
  from all
  group by teamID

Error:

Caused by: org.apache.pinot.spi.exception.BadQueryRequestException: Invalid aggregation function: percentile($1,$2); Reason: null
	at org.apache.pinot.core.query.aggregation.function.AggregationFunctionFactory.getAggregationFunction(AggregationFunctionFactory.java:487)
	at org.apache.pinot.query.runtime.operator.AggregateOperator.getAggFunction(AggregateOperator.java:224)
	at org.apache.pinot.query.runtime.operator.AggregateOperator.getAggFunctions(AggregateOperator.java:199)
	at org.apache.pinot.query.runtime.operator.AggregateOperator.<init>(AggregateOperator.java:79)
org.apache.pinot.query.service.dispatch.QueryDispatcher.submit(QueryDispatcher.java:198)
org.apache.pinot.query.service.dispatch.QueryDispatcher.submitAndReduce(QueryDispatcher.java:95)
org.apache.pinot.broker.requesthandler.MultiStageBrokerRequestHandler.handleRequest(MultiStageBrokerRequestHandler.java:217)
org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleRequest(BaseBrokerRequestHandler.java:133)

@codecov-commenter
Copy link

codecov-commenter commented Jul 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 62.11%. Comparing base (59551e4) to head (d81e59e).
Report is 736 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13561      +/-   ##
============================================
+ Coverage     61.75%   62.11%   +0.36%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2558     +122     
  Lines        133233   140927    +7694     
  Branches      20636    21867    +1231     
============================================
+ Hits          82274    87536    +5262     
- Misses        44911    46771    +1860     
- Partials       6048     6620     +572     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.08% <100.00%> (+0.37%) ⬆️
java-21 61.99% <100.00%> (+0.37%) ⬆️
skip-bytebuffers-false 62.10% <100.00%> (+0.35%) ⬆️
skip-bytebuffers-true 35.23% <100.00%> (+7.50%) ⬆️
temurin 62.11% <100.00%> (+0.36%) ⬆️
unittests 62.11% <100.00%> (+0.36%) ⬆️
unittests1 46.69% <100.00%> (-0.21%) ⬇️
unittests2 27.66% <0.00%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@@ -220,7 +221,7 @@ private static PinotLogicalAggregate convertAggFromIntermediateInput(RelOptRuleC
PinotLogicalExchange exchange, AggType aggType) {
Aggregate aggRel = call.rel(0);
RelNode input = PinotRuleUtils.unboxRel(aggRel.getInput());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unboxRel is done within findImmediateProjects(), no need to unbox here

@@ -259,7 +260,7 @@ private static PinotLogicalAggregate convertAggFromIntermediateInput(RelOptRuleC

private static List<AggregateCall> buildAggCalls(Aggregate aggRel, AggType aggType) {
RelNode input = PinotRuleUtils.unboxRel(aggRel.getInput());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

relNode = PinotRuleUtils.unboxRel(relNode);
if (relNode instanceof Project) {
return ((Project) relNode).getProjects();
} else if (relNode instanceof Union) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this apply to other RelNode other than Union?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there could be more. For instance, if we have a Filter or even a Sort, we should ideally be able to hoist a literal up to the aggregate node.

I am a bit concerned if that could have edge-cases, that's why I kept the scope of the change in this PR small.

(digression follows)

Ideally, I think we should do this step in a separate optimization rule (run before agg exchange insert) which is run only for aggregate nodes.

Clubbing this with exchange insertion makes it a bit complicated to reason about.

@Jackie-Jiang Jackie-Jiang merged commit e80d95f into apache:master Jul 10, 2024
@Jackie-Jiang Jackie-Jiang added the multi-stage Related to the multi-stage query engine label Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PERCENTILEEST throws NPE in MultiStage Engine

3 participants