[multistage] UNION/INTERSECT/EXCEPT implementation #10622
[multistage] UNION/INTERSECT/EXCEPT implementation #10622xiangfu0 merged 2 commits intoapache:masterfrom
Conversation
3a56258 to
df51b3a
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10622 +/- ##
============================================
- Coverage 70.32% 70.28% -0.04%
+ Complexity 6429 5642 -787
============================================
Files 2112 2116 +4
Lines 114056 114143 +87
Branches 17226 17240 +14
============================================
+ Hits 80213 80230 +17
- Misses 28244 28310 +66
- Partials 5599 5603 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ca3bf33 to
fa52b38
Compare
65348a6 to
a8eeb2c
Compare
ce434b1 to
093b5ff
Compare
There was a problem hiding this comment.
this is basically sending all data to a single server? wouldn't it cause blow up?
There was a problem hiding this comment.
Changed to hash all the columns.
There was a problem hiding this comment.
this and others below: seems like it is assuming it always gets Union "all=true"?
do we plan to support all=false going forward? if so can we add check on setOp.all ? and also add a ignored test in SetOp.json test cases?
There was a problem hiding this comment.
For now UNION is parsed to UNION ALL + DISTINCT, so for UNION, all is always true.
For INTERSECT and MINUS, the all is always false.
We can support it the check for all later if parser side can support it.
walterddr
left a comment
There was a problem hiding this comment.
lgtm overall. please fix the explain plan test
done |
UNION/INTERSECT/EXCEPT(MINUS) implementation based on #10535
For INTERSECT/EXCEPT(MINUS) implementation, the code assumes the right side is SMALLER, so it will go through the entire right side to construct a TreeSet data structure, then go through left side to decide what to put in the final results.
DISCLAIMER: Multi-value distinct is not supported, so
UNIONwon't dedup on array columns. Will have a follow up PR to fix this issue. Follow up issue: #10658Union All example:
select * from billing where city = 'Palo Alto' UNION ALL select * from billing where city = 'Mountain View'The result matches the two individual results (287 = 140 + 147):
select * from billing where city = 'Palo Alto'select * from billing where city = 'Mountain View'For Union example:
The dedup happens as expected for:

select * from billing where city = 'Palo Alto' UNION select * from billing where city = 'Palo Alto'Scanned 280 rows and results is 140.
The result is the same as
select * from billing where city = 'Palo Alto'For Intersect example:

select * from billing INTERSECT select * from billing where city = 'Mountain View'For Except(Minus) example:

select * from billing EXCEPT select * from billing where city = 'Mountain View'