[multistage][bugfix] eliminate multiple exchanges#11882
[multistage][bugfix] eliminate multiple exchanges#11882walterddr merged 2 commits intoapache:masterfrom
Conversation
7f2962b to
ce4a8eb
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #11882 +/- ##
============================================
- Coverage 61.42% 61.41% -0.02%
+ Complexity 1147 1146 -1
============================================
Files 2375 2375
Lines 128501 128501
Branches 19846 19846
============================================
- Hits 78936 78919 -17
- Misses 43859 43875 +16
- Partials 5706 5707 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
This is interesting for two filters, I think visitFilter in ServerPlanRequestVisitor will override each other, so one predicate will be skipped.
There was a problem hiding this comment.
this is not unexpected. (we can potentially make it more smart). the problem is that
WITH tmp1
AS (SELECT *
FROM a
WHERE col2 NOT IN ( 'foo', 'bar' )),
tmp2
AS (SELECT *
FROM b
WHERE col1 IN (SELECT col1
FROM tmp1)
AND col3 < 100)
SELECT *
FROM tmp2
WHERE col1 IN (SELECT col1
FROM tmp1 -- this is the tmp1 modification into tmp1'
WHERE col3 > 10)
at this stage
- Calcite's doesn't know if you want to do a table spool of tmp1 and make tmp1' derived from tmp1 --> thus col3 > 10 is not pushed down.
- only when we decided that table spool is not possible and then we decided to copy tmp1 and tmp1' into 2 separate sub queries, at that time i think the hep optimizer already get passed the filter merging phase.
rewriting the query in this way
WITH tmp1
AS (SELECT *
FROM a
WHERE col2 NOT IN ( 'foo', 'bar' )),
tmp2
AS (SELECT *
FROM b
WHERE col1 IN (SELECT col1
FROM tmp1)
AND col3 < 100),
tmp3 -- here we explicitly tell that tmp3 and tmp1 are not related
AS (SELECT *
FROM a
WHERE col2 NOT IN ( 'foo', 'bar' ) AND col3 > 10),
SELECT *
FROM tmp2
WHERE col1 IN (SELECT col1
FROM tmp3)
produces no multi-filter plan
There was a problem hiding this comment.
If we cannot avoid this, then it means the ServerPlanRequestVisitor should handle the leaf query generation properly.
There was a problem hiding this comment.
this is a un-plannable sql before or after this bug fix PR. I would suggest differ the fix into a different PR (i will change the test to make sure it doesn't happen)
ce4a8eb to
e224211
Compare
this fixes #11881
this rule will be merged into a more sophisticated rule that will be created in #11831