Skip to content

[multistage] UNION/INTERSECT/EXCEPT implementation #10622

Merged
xiangfu0 merged 2 commits intoapache:masterfrom
xiangfu0:v2-union-impl
May 3, 2023
Merged

[multistage] UNION/INTERSECT/EXCEPT implementation #10622
xiangfu0 merged 2 commits intoapache:masterfrom
xiangfu0:v2-union-impl

Conversation

@xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Apr 17, 2023

UNION/INTERSECT/EXCEPT(MINUS) implementation based on #10535

For INTERSECT/EXCEPT(MINUS) implementation, the code assumes the right side is SMALLER, so it will go through the entire right side to construct a TreeSet data structure, then go through left side to decide what to put in the final results.

DISCLAIMER: Multi-value distinct is not supported, so UNION won't dedup on array columns. Will have a follow up PR to fix this issue. Follow up issue: #10658

Union All example:
select * from billing where city = 'Palo Alto' UNION ALL select * from billing where city = 'Mountain View'

image

The result matches the two individual results (287 = 140 + 147):

select * from billing where city = 'Palo Alto'
image

select * from billing where city = 'Mountain View'
image

For Union example:

The dedup happens as expected for:
select * from billing where city = 'Palo Alto' UNION select * from billing where city = 'Palo Alto'
Scanned 280 rows and results is 140.
image

The result is the same as select * from billing where city = 'Palo Alto'

For Intersect example:
select * from billing INTERSECT select * from billing where city = 'Mountain View'
image

For Except(Minus) example:
select * from billing EXCEPT select * from billing where city = 'Mountain View'
image

@xiangfu0 xiangfu0 force-pushed the v2-union-impl branch 2 times, most recently from 3a56258 to df51b3a Compare April 17, 2023 10:09
@codecov-commenter
Copy link

codecov-commenter commented Apr 17, 2023

Codecov Report

❌ Patch coverage is 70.00000% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.28%. Comparing base (cad764d) to head (767b1c1).
⚠️ Report is 3966 commits behind head on master.

Files with missing lines Patch % Lines
...ache/pinot/query/runtime/operator/SetOperator.java 76.08% 9 Missing and 2 partials ⚠️
.../pinot/query/runtime/operator/utils/SortUtils.java 0.00% 11 Missing ⚠️
.../pinot/query/runtime/plan/PhysicalPlanVisitor.java 84.61% 1 Missing and 1 partial ⚠️
...t/query/runtime/plan/ServerRequestPlanVisitor.java 0.00% 2 Missing ⚠️
...he/pinot/query/runtime/operator/UnionOperator.java 90.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #10622      +/-   ##
============================================
- Coverage     70.32%   70.28%   -0.04%     
+ Complexity     6429     5642     -787     
============================================
  Files          2112     2116       +4     
  Lines        114056   114143      +87     
  Branches      17226    17240      +14     
============================================
+ Hits          80213    80230      +17     
- Misses        28244    28310      +66     
- Partials       5599     5603       +4     
Flag Coverage Δ
integration1 24.28% <0.00%> (-0.07%) ⬇️
integration2 23.86% <0.00%> (-0.17%) ⬇️
unittests1 67.84% <70.00%> (+0.01%) ⬆️
unittests2 13.81% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 force-pushed the v2-union-impl branch 3 times, most recently from ca3bf33 to fa52b38 Compare April 20, 2023 06:41
@xiangfu0 xiangfu0 changed the title V2 union impl [multistage] V2 UNION ALL implementation Apr 20, 2023
@xiangfu0 xiangfu0 changed the title [multistage] V2 UNION ALL implementation [multistage] V2 UNION implementation Apr 20, 2023
@xiangfu0 xiangfu0 changed the title [multistage] V2 UNION implementation [multistage] UNION implementation Apr 20, 2023
@xiangfu0 xiangfu0 changed the title [multistage] UNION implementation [multistage] UNION/INTERSECT/EXCEPT implementation Apr 20, 2023
@xiangfu0 xiangfu0 force-pushed the v2-union-impl branch 4 times, most recently from 65348a6 to a8eeb2c Compare April 24, 2023 17:51
@Jackie-Jiang Jackie-Jiang added feature multi-stage Related to the multi-stage query engine labels Apr 25, 2023
@xiangfu0 xiangfu0 force-pushed the v2-union-impl branch 3 times, most recently from ce434b1 to 093b5ff Compare April 28, 2023 08:57
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is basically sending all data to a single server? wouldn't it cause blow up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to hash all the columns.

Comment on lines 120 to 121
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and others below: seems like it is assuming it always gets Union "all=true"?
do we plan to support all=false going forward? if so can we add check on setOp.all ? and also add a ignored test in SetOp.json test cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now UNION is parsed to UNION ALL + DISTINCT, so for UNION, all is always true.
For INTERSECT and MINUS, the all is always false.

We can support it the check for all later if parser side can support it.

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall. please fix the explain plan test

@xiangfu0
Copy link
Contributor Author

xiangfu0 commented May 3, 2023

lgtm overall. please fix the explain plan test

done

@xiangfu0 xiangfu0 merged commit 1c4d0fe into apache:master May 3, 2023
@xiangfu0 xiangfu0 deleted the v2-union-impl branch May 4, 2023 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants