Skip to content

[multistage] bridge v2 query engine for leaf stage v1 multi-value column#11117

Merged
xiangfu0 merged 8 commits intoapache:masterfrom
xiangfu0:unnest-for-v1-groupby
Jul 26, 2023
Merged

[multistage] bridge v2 query engine for leaf stage v1 multi-value column#11117
xiangfu0 merged 8 commits intoapache:masterfrom
xiangfu0:unnest-for-v1-groupby

Conversation

@xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Jul 16, 2023

Introduces a new function arrayToMultiValue to bridge v2 query engine for leaf stage v1 group by multi-value column.

For query: SELECT count(*), RandomAirports FROM airlineStats GROUP BY RandomAirports, the plan is

Execution Plan
LogicalProject(EXPR$0=[$1], RandomAirports=[$0])
  LogicalAggregate(group=[{0}], agg#0=[COUNT($1)])
    PinotLogicalExchange(distribution=[hash[0]])
      LogicalAggregate(group=[{71}], agg#0=[COUNT()])
        LogicalTableScan(table=[[airlineStats]])

The stacktrace is:

2023/07/15 20:12:31.275 ERROR [MailboxSendOperator] [query_intermediate_worker_on_52712_port-10-thread-2] Exception while transferring data on opChain: 16918855000000008_1_2
java.lang.IllegalStateException: Incompatible selection result data schema:  Expected: [RandomAirports(STRING_ARRAY),$f1(LONG)]. Actual: [RandomAirports(STRING),count(*)(LONG)]
	at com.google.common.base.Preconditions.checkState(Preconditions.java:512) ~[guava-32.0.1-jre.jar:?]
	at org.apache.pinot.query.runtime.operator.LeafStageTransferableBlockOperator.composeGroupByTransferableBlock(LeafStageTransferableBlockOperator.java:198) ~[classes/:?]
	at org.apache.pinot.query.runtime.operator.LeafStageTransferableBlockOperator.composeTransferableBlock(LeafStageTransferableBlockOperator.java:158) ~[classes/:?]
	at org.apache.pinot.query.runtime.operator.LeafStageTransferableBlockOperator.getNextBlock(LeafStageTransferableBlockOperator.java:114) ~[classes/:?]
	at org.apache.pinot.query.runtime.operator.MultiStageOperator.nextBlock(MultiStageOperator.java:57) ~[classes/:?]
	at org.apache.pinot.query.runtime.operator.MailboxSendOperator.getNextBlock(MailboxSendOperator.java:124) [classes/:?]
	at org.apache.pinot.query.runtime.operator.MultiStageOperator.nextBlock(MultiStageOperator.java:57) [classes/:?]
	at org.apache.pinot.query.runtime.operator.MultiStageOperator.nextBlock(MultiStageOperator.java:33) [classes/:?]
	at org.apache.pinot.query.runtime.executor.OpChainSchedulerService$1.runJob(OpChainSchedulerService.java:94) [classes/:?]
	at org.apache.pinot.core.util.trace.TraceRunnable.run(TraceRunnable.java:40) [classes/:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]

After the bridge, query is:

SELECT count(*), arrayToMV(RandomAirports) FROM airlineStats GROUP BY arrayToMV(RandomAirports)

The plan is

Execution Plan
LogicalProject(EXPR$0=[$1], EXPR$1=[$0])
  LogicalAggregate(group=[{0}], agg#0=[COUNT($1)])
    PinotLogicalExchange(distribution=[hash[0]])
      LogicalAggregate(group=[{0}], agg#0=[COUNT()])
        LogicalProject($f0=[ARRAYTOMULTIVALUE($71)])
          LogicalTableScan(table=[[airlineStats]])
image

Also it supports predicate for multi-value in v2:

select RandomAirports from airlineStats WHERE arrayToMultiValue(RandomAirports) ='SEA' limit 10
image

@xiangfu0 xiangfu0 force-pushed the unnest-for-v1-groupby branch from df83612 to 35693aa Compare July 16, 2023 03:19
@xiangfu0 xiangfu0 changed the title [multistage] bridge v2 query engine for leaf stage v1 group by multi-value column [multistage] bridge v2 query engine for leaf stage v1 multi-value column Jul 16, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jul 16, 2023

Codecov Report

Merging #11117 (82d2a56) into master (4d72eb5) will decrease coverage by 0.01%.
Report is 1 commits behind head on master.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master   #11117      +/-   ##
==========================================
- Coverage    0.11%    0.11%   -0.01%     
==========================================
  Files        2218     2218              
  Lines      119113   119125      +12     
  Branches    18021    18023       +2     
==========================================
  Hits          137      137              
- Misses     118956   118968      +12     
  Partials       20       20              
Flag Coverage Δ
integration1temurin11 0.00% <0.00%> (ø)
integration1temurin17 0.00% <0.00%> (ø)
integration1temurin20 0.00% <0.00%> (ø)
integration2temurin17 0.00% <0.00%> (ø)
integration2temurin20 0.00% <0.00%> (ø)
unittests1temurin11 0.00% <0.00%> (ø)
unittests1temurin17 0.00% <0.00%> (ø)
unittests1temurin20 0.00% <0.00%> (ø)
unittests2temurin11 0.11% <0.00%> (-0.01%) ⬇️
unittests2temurin17 0.11% <0.00%> (-0.01%) ⬇️
unittests2temurin20 0.11% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...apache/pinot/common/function/FunctionRegistry.java 0.00% <0.00%> (ø)
...e/pinot/common/function/TransformFunctionType.java 0.00% <0.00%> (ø)
...r/transform/function/TransformFunctionFactory.java 0.00% <ø> (ø)
...pinot/query/parser/CalciteRexExpressionParser.java 0.00% <0.00%> (ø)
.../query/planner/logical/RelToPlanNodeConverter.java 0.00% <0.00%> (ø)
...ry/runtime/plan/server/ServerPlanRequestUtils.java 0.00% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@kishoreg
Copy link
Member

it feels like the other way around right? multivalue_to_array?

@xiangfu0 xiangfu0 force-pushed the unnest-for-v1-groupby branch 6 times, most recently from 7134ae1 to cc18f23 Compare July 19, 2023 05:54
@Jackie-Jiang Jackie-Jiang added the multi-stage Related to the multi-stage query engine label Jul 20, 2023
@xiangfu0 xiangfu0 force-pushed the unnest-for-v1-groupby branch 6 times, most recently from a0732bd to bc3883d Compare July 20, 2023 22:55
@xiangfu0 xiangfu0 force-pushed the unnest-for-v1-groupby branch from bc3883d to ee8c112 Compare July 21, 2023 08:54
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is confusing name. the data type is already MV running an ARRAY_TO_MV is a bit weird.
should we named it USE_AS_MV? and we can say that MV columns are by default USE_AS_ARRAY

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

USE_AS_MV might be confusing, since we repurposed MV as ARRAY in v2, ARRAY_TO_MV might be more explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we need to figure out a proper way b/c out put of a select is an array, but the table config / schema will still call this as MV column as convension. both have some confusion, but as long as the document is proper we should be good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it work if I run

SELECT count(*), arrayToMV(RandomAirports) 
FROM mytable 
WHERE Dest IN (SELECT Dest FROM myTable GROUP BY Dest HAVING count(*) > 10)

(later when we implemented the scalar function wrapper)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a group by arrayToMV(RandomAirports) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works at leaf stage not intermediate stage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the grand question is whether we continue to support MV as a data type or collapse into ARRAY.
if the answer to the above question is NO, this PR is not needed.
if the answer is YES. there are several concerns. i thought about it a bit last night. the problem is 2 fold.

  1. how do we support it conforming with ANSI syntax
  2. how do we make sure v1 perform the same as before.

for Q1: the answer is probably MULTI_SET

  • it is a SET with no ordering but can contain duplicates, suitable for our use case for GROUP-BY and FILTER, which they are considered as expanded/unnested during eval.
  • but not every use cases is considered as MULTI_SET, for example selection will consider MV as an ARRAY
    therefore,
  1. an explicit USE_AS_MULTISET(mv) is desired to bridge the syntatic gap
  2. explicit USE_AS_ARRAY(mv) is the default (considering select * there's no reason to ask user explicitly put this in

for Q2: the problem is these USE_AS_*** methods are considered transform, which shouldn't be the case there's a simply way to solve this --> in CalciteRexExpressionUtils, we can simply ignored the functionCall and directly return the operand as a reference in V1
b/c V1 is SqlNode and bares no type info, this means directly putting the operand input reference without the USE_AS_*** type conversion will work naturally with the V1 context.

so in short my proposal is

  1. have USE_AS_ARRAY and USE_AS_MULTISET as scalarFunction wrappers (unimplemented) to bridge the gap of the syntactic problem on calcite
  2. in PhysicalPlanner explicitly exclude these syntatitic functions and directly drop to the operand;
  3. in the mid term we will have MV, MULTI_SET, ARRAY as 3 "co-existing" types, but in V1 there's only MV; in V2 MV is considered by default as ARRAY, unless USE_AS_MULTISET is used.
  4. in the long term, once we fully found all alternatives in standard SQL syntax for MV, we can safely consider MV as ARRAY. and only do MULTISET when necessary (i can only see it being useful in MV > 5 situation, where there's no array equivalent, only multiset equivalent)

WDYT? Please let me know

@xiangfu0 xiangfu0 force-pushed the unnest-for-v1-groupby branch 2 times, most recently from bc88977 to 252367f Compare July 21, 2023 19:19
@xiangfu0
Copy link
Contributor Author

the grand question is whether we continue to support MV as a data type or collapse into ARRAY. if the answer to the above question is NO, this PR is not needed. if the answer is YES. there are several concerns. i thought about it a bit last night. the problem is 2 fold.

  1. how do we support it conforming with ANSI syntax
  2. how do we make sure v1 perform the same as before.

for Q1: the answer is probably MULTI_SET

  • it is a SET with no ordering but can contain duplicates, suitable for our use case for GROUP-BY and FILTER, which they are considered as expanded/unnested during eval.
  • but not every use cases is considered as MULTI_SET, for example selection will consider MV as an ARRAY
    therefore,
  1. an explicit USE_AS_MULTISET(mv) is desired to bridge the syntatic gap
  2. explicit USE_AS_ARRAY(mv) is the default (considering select * there's no reason to ask user explicitly put this in

for Q2: the problem is these USE_AS_*** methods are considered transform, which shouldn't be the case there's a simply way to solve this --> in CalciteRexExpressionUtils, we can simply ignored the functionCall and directly return the operand as a reference in V1 b/c V1 is SqlNode and bares no type info, this means directly putting the operand input reference without the USE_AS_*** type conversion will work naturally with the V1 context.

so in short my proposal is

  1. have USE_AS_ARRAY and USE_AS_MULTISET as scalarFunction wrappers (unimplemented) to bridge the gap of the syntactic problem on calcite
  2. in PhysicalPlanner explicitly exclude these syntatitic functions and directly drop to the operand;
  3. in the mid term we will have MV, MULTI_SET, ARRAY as 3 "co-existing" types, but in V1 there's only MV; in V2 MV is considered by default as ARRAY, unless USE_AS_MULTISET is used.
  4. in the long term, once we fully found all alternatives in standard SQL syntax for MV, we can safely consider MV as ARRAY. and only do MULTISET when necessary (i can only see it being useful in MV > 5 situation, where there's no array equivalent, only multiset equivalent)

WDYT? Please let me know

MV column in V2 is modeled as ARRAY by default.
So in terms of supporting v1 format, we need to:

  1. use ARRAY_TO_MV or USE_AS_MV as the bridge to ensure Calcite understand the type consistency/conversion.
  2. (TODO) Implement MV GROUP BY in v2 intermediate stage to complete the story.
  3. (TODO) Implement ARRAY GROUP BY later on

@xiangfu0 xiangfu0 force-pushed the unnest-for-v1-groupby branch 3 times, most recently from 50b400d to 62e5d0c Compare July 21, 2023 20:25
@xiangfu0
Copy link
Contributor Author

Test failures requiring fix by #11151

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we need to figure out a proper way b/c out put of a select is an array, but the table config / schema will still call this as MV column as convension. both have some confusion, but as long as the document is proper we should be good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's first create a component return type registry on @ScalarFunction so we dont have to modify the transform function side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually ignored previous comment, did a bit of research and it seems like registering TransformFunctionType without an actual impl is better than having to parse scalar function annotation, which doesn't really allow anything other than primitives

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have a different pr for this.

@xiangfu0 xiangfu0 force-pushed the unnest-for-v1-groupby branch from 6e7e739 to a16c4f0 Compare July 24, 2023 17:45
@xiangfu0 xiangfu0 force-pushed the unnest-for-v1-groupby branch from a16c4f0 to 6d65102 Compare July 25, 2023 02:49
@xiangfu0
Copy link
Contributor Author

rebase to master

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is function name always upper case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's canonicalized and just do ARRAY_TO_MV

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's canonicalized and just do ARRAY_TO_MV

@xiangfu0 xiangfu0 force-pushed the unnest-for-v1-groupby branch 3 times, most recently from 0e81e15 to e33c7a3 Compare July 26, 2023 07:14
@xiangfu0 xiangfu0 force-pushed the unnest-for-v1-groupby branch from e33c7a3 to 82d2a56 Compare July 26, 2023 18:06
@xiangfu0 xiangfu0 merged commit a0ff2e8 into apache:master Jul 26, 2023
@xiangfu0 xiangfu0 deleted the unnest-for-v1-groupby branch July 26, 2023 21:04
@xiangfu0
Copy link
Contributor Author

Part of this #10658

s0nskar pushed a commit to s0nskar/pinot that referenced this pull request Aug 10, 2023
…umn (apache#11117)

* [multistage] bridge v2 query engine for leaf stage v1 group by multi-value column

* use multi-set

* Change multi-value type back to array

* rewrite arrayToMV at leaf stage

* Enable more tests

* fix integration tests with generated queries

* Address comments

* Take out MultiValueBetweenPredicateGenerator from _multistageSingleValuePredicateGenerators
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants