[multistage] support inequality JOIN by walterddr · Pull Request #9448 · apache/pinot

walterddr · 2022-09-22T20:20:39Z

support inequality join with nested loop.

adding nested loop directly on hash join algorithm

this is not the most efficient way to join as when there's no equality join key we should not use hash join anyway
when there's no join key, use broadcast assuming the right table is smaller.

TODO

implement other join algorithms

codecov-commenter · 2022-09-22T22:02:51Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.36%. Comparing base (2daa863) to head (022f5ef).
Report is 2961 commits behind head on master.

Additional details and impacted files

@@              Coverage Diff              @@
##             master    #9448       +/-   ##
=============================================
+ Coverage     35.09%   68.36%   +33.27%     
- Complexity      189     5144     +4955     
=============================================
  Files          1915     1915               
  Lines        101994   102056       +62     
  Branches      15468    15481       +13     
=============================================
+ Hits          35791    69768    +33977     
+ Misses        63158    27338    -35820     
- Partials       3045     4950     +1905

Flag	Coverage Δ
integration1	`?`
integration2	`24.68% <0.00%> (?)`
unittests1	`67.11% <100.00%> (?)`
unittests2	`15.53% <100.00%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

agavra

Just reviewing for my own personal learning and ramp-up :) feel free to disregard any comments that don't make sense.

LGTM though

...uery-planner/src/main/java/org/apache/calcite/rel/rules/PinotJoinExchangeNodeInsertRule.java

agavra · 2022-09-28T23:42:28Z

...uery-planner/src/main/java/org/apache/calcite/rel/rules/PinotJoinExchangeNodeInsertRule.java

+
+    if (joinInfo.leftKeys.isEmpty()) {
+      // when there's no JOIN key, use broadcast.
+      leftExchange = LogicalExchange.create(leftInput, RelDistributions.SINGLETON);


just to make sure that I'm understanding this correctly - using SINGLETON on the left and BROADCAST_DISTRIBUTED on the right would mean that each row in the right table is broadcasted to every node that hosts a segment of the left table, and the left tables remain in place (essentially not exchanged at all)?

when there's no join key, use broadcast assuming the right table is smaller.

I noticed that there's a computeSelfCost on RelNode which wires down to basically getting a row count. Is there a way for us to hook that in to make sure that we're actually choosing the smaller table? Obviously this can be left for a follow-up, just wanted to learn :)

cost is not enabled. it is either zero or infinite right now. and yes the assumption is correct.

agavra · 2022-09-28T23:51:24Z

pinot-query-planner/src/main/java/org/apache/calcite/rel/rules/PinotQueryRuleSets.java

-          // push a filter into a join, replaced CoreRules.FILTER_INTO_JOIN with special config
-          PinotFilterIntoJoinRule.INSTANCE,
+          // push a filter into a join
+          CoreRules.FILTER_INTO_JOIN,


an aside, how do we choose which rules to use here? I notice we don't include, for example, SORT_JOIN_TRANSPOSE which can push sorts down in the case of LEFT/RIGHT OUTER JOIN but that isn't here?

most of the rules "available but not use" generally means one of the other

they work best with cost factory that's not a dummy

they might cause sideeffect we don't want

thus we leave that out due to "preserve correctness over performant" rule

agavra · 2022-09-28T23:59:01Z

pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java

        for (Object[] rightRow : hashCollection) {
-          rows.add(joinRow(leftRow, rightRow));
+          Object[] resultRow = joinRow(leftRow, rightRow);
+          if (_joinClauseEvaluators.isEmpty() || _joinClauseEvaluators.stream().allMatch(


it looks like _joinClauseEvaluators.isEmpty() is true if there are only non-equijoin conditions - I didn't fully look into calcite code, but how does it handle a situation with both equi and non-equi joins (e.g. JOIN ON a.col1 = b.col1 AND a.col2 > b.col2)?

can we add a test for that scenario?

calcite automatically splits into join keys and non-eq join conditions; we already have a test for this. see QueryRunnerTest (remember WHERE clause pulled up is applied if predicate involve columns on both table)

* support inequality JOIN * also support pure inequality join * address diff comment and add a SEMI join test case for this as well Co-authored-by: Rong Rong <rongr@startree.ai>

walterddr mentioned this pull request Sep 22, 2022

[multistage] support additional SQL feature tracker #9223

Open

31 tasks

walterddr marked this pull request as ready for review September 22, 2022 21:10

Rong Rong added 2 commits September 28, 2022 16:28

support inequality JOIN

c2f3dba

also support pure inequality join

93553de

walterddr force-pushed the pr_inequality_join branch from b71900d to 93553de Compare September 28, 2022 23:28

agavra reviewed Sep 29, 2022

View reviewed changes

address diff comment and add a SEMI join test case for this as well

022f5ef

siddharthteotia approved these changes Sep 29, 2022

View reviewed changes

siddharthteotia merged commit 3057712 into apache:master Sep 29, 2022

walterddr deleted the pr_inequality_join branch December 6, 2023 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[multistage] support inequality JOIN#9448

[multistage] support inequality JOIN#9448
siddharthteotia merged 3 commits intoapache:masterfrom
walterddr:pr_inequality_join

walterddr commented Sep 22, 2022 •

edited

Loading

Uh oh!

codecov-commenter commented Sep 22, 2022 •

edited

Loading

Uh oh!

agavra left a comment

Uh oh!

Uh oh!

agavra Sep 28, 2022

Uh oh!

walterddr Sep 29, 2022

Uh oh!

agavra Sep 28, 2022

Uh oh!

walterddr Sep 29, 2022 •

edited

Loading

Uh oh!

agavra Sep 28, 2022

Uh oh!

walterddr Sep 29, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

walterddr commented Sep 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Sep 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

agavra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

agavra Sep 28, 2022

Choose a reason for hiding this comment

Uh oh!

walterddr Sep 29, 2022

Choose a reason for hiding this comment

Uh oh!

agavra Sep 28, 2022

Choose a reason for hiding this comment

Uh oh!

walterddr Sep 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agavra Sep 28, 2022

Choose a reason for hiding this comment

Uh oh!

walterddr Sep 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

walterddr commented Sep 22, 2022 •

edited

Loading

codecov-commenter commented Sep 22, 2022 •

edited

Loading

walterddr Sep 29, 2022 •

edited

Loading

walterddr Sep 29, 2022 •

edited

Loading