-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[multistage] Fix Predicate Pushdown by Using Rule Collection #10409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7880196
390cda5
ca64770
fc34fb7
2ddfec3
986ea6c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,14 +6,13 @@ | |
| "sql": "EXPLAIN PLAN FOR SELECT AVG(a.col4) as avg FROM a WHERE a.col3 >= 0 AND a.col2 = 'pink floyd'", | ||
| "output": [ | ||
| "Execution Plan", | ||
| "\nLogicalProject(avg=[/($0, $1)])", | ||
| "\n LogicalProject($f0=[CASE(=($1, 0), null:DECIMAL(1000, 0), $0)], $f1=[$1])", | ||
| "\n LogicalAggregate(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)])", | ||
| "\n LogicalExchange(distribution=[hash])", | ||
| "\n LogicalAggregate(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT()])", | ||
| "\n LogicalProject(col4=[$0], col2=[$1], col3=[$2])", | ||
| "\n LogicalFilter(condition=[AND(>=($2, 0), =($1, 'pink floyd'))])", | ||
| "\n LogicalTableScan(table=[[a]])", | ||
| "\nLogicalProject(avg=[/(CASE(=($1, 0), null:DECIMAL(1000, 0), $0), $1)])", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. wonder what made this 2 project merge possible after the rule changes.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is because I am now doing the pruning/merging of redundant operators in the end and using a RuleCollection. (if that's what you meant)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah. gotcha. this is great! |
||
| "\n LogicalAggregate(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)])", | ||
| "\n LogicalExchange(distribution=[hash])", | ||
| "\n LogicalAggregate(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT()])", | ||
| "\n LogicalProject(col4=[$0], col2=[$1], col3=[$2])", | ||
| "\n LogicalFilter(condition=[AND(>=($2, 0), =($1, 'pink floyd'))])", | ||
| "\n LogicalTableScan(table=[[a]])", | ||
| "\n" | ||
| ] | ||
| }, | ||
|
|
@@ -22,14 +21,13 @@ | |
| "sql": "EXPLAIN PLAN FOR SELECT AVG(a.col4) as avg, SUM(a.col4) as sum, MAX(a.col4) as max FROM a WHERE a.col3 >= 0 AND a.col2 = 'pink floyd'", | ||
| "output": [ | ||
| "Execution Plan", | ||
| "\nLogicalProject(avg=[/($0, $1)], sum=[CASE(=($1, 0), null:DECIMAL(1000, 0), $2)], max=[$3])", | ||
| "\n LogicalProject($f0=[CASE(=($1, 0), null:DECIMAL(1000, 0), $0)], $f1=[$1], sum=[$0], max=[$2])", | ||
| "\n LogicalAggregate(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)], max=[MAX($2)])", | ||
| "\n LogicalExchange(distribution=[hash])", | ||
| "\n LogicalAggregate(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT()], max=[MAX($0)])", | ||
| "\n LogicalProject(col4=[$0], col2=[$1], col3=[$2])", | ||
| "\n LogicalFilter(condition=[AND(>=($2, 0), =($1, 'pink floyd'))])", | ||
| "\n LogicalTableScan(table=[[a]])", | ||
| "\nLogicalProject(avg=[/(CASE(=($1, 0), null:DECIMAL(1000, 0), $0), $1)], sum=[CASE(=($1, 0), null:DECIMAL(1000, 0), $0)], max=[$2])", | ||
| "\n LogicalAggregate(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)], max=[MAX($2)])", | ||
| "\n LogicalExchange(distribution=[hash])", | ||
| "\n LogicalAggregate(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT()], max=[MAX($0)])", | ||
| "\n LogicalProject(col4=[$0], col2=[$1], col3=[$2])", | ||
| "\n LogicalFilter(condition=[AND(>=($2, 0), =($1, 'pink floyd'))])", | ||
| "\n LogicalTableScan(table=[[a]])", | ||
| "\n" | ||
| ] | ||
| }, | ||
|
|
@@ -38,14 +36,13 @@ | |
| "sql": "EXPLAIN PLAN FOR SELECT AVG(a.col3) as avg, COUNT(*) as count FROM a WHERE a.col3 >= 0 AND a.col2 = 'pink floyd'", | ||
| "output": [ | ||
| "Execution Plan", | ||
| "\nLogicalProject(avg=[/(CAST($0):DOUBLE, $1)], count=[$1])", | ||
| "\n LogicalProject($f0=[CASE(=($1, 0), null:INTEGER, $0)], $f1=[$1])", | ||
| "\n LogicalAggregate(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)])", | ||
| "\n LogicalExchange(distribution=[hash])", | ||
| "\n LogicalAggregate(group=[{}], agg#0=[$SUM0($1)], agg#1=[COUNT()])", | ||
| "\n LogicalProject(col2=[$1], col3=[$2])", | ||
| "\n LogicalFilter(condition=[AND(>=($2, 0), =($1, 'pink floyd'))])", | ||
| "\n LogicalTableScan(table=[[a]])", | ||
| "\nLogicalProject(avg=[/(CAST(CASE(=($1, 0), null:INTEGER, $0)):DOUBLE, $1)], count=[$1])", | ||
| "\n LogicalAggregate(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)])", | ||
| "\n LogicalExchange(distribution=[hash])", | ||
| "\n LogicalAggregate(group=[{}], agg#0=[$SUM0($1)], agg#1=[COUNT()])", | ||
| "\n LogicalProject(col2=[$1], col3=[$2])", | ||
| "\n LogicalFilter(condition=[AND(>=($2, 0), =($1, 'pink floyd'))])", | ||
| "\n LogicalTableScan(table=[[a]])", | ||
| "\n" | ||
| ] | ||
| }, | ||
|
|
@@ -97,14 +94,13 @@ | |
| "sql": "EXPLAIN PLAN FOR SELECT /*+ skipLeafStageGroupByAggregation */ AVG(a.col3) as avg, COUNT(*) as count FROM a WHERE a.col3 >= 0 AND a.col2 = 'pink floyd'", | ||
| "output": [ | ||
| "Execution Plan", | ||
| "\nLogicalProject(avg=[/(CAST($0):DOUBLE, $1)], count=[$1])", | ||
| "\n LogicalProject($f0=[CASE(=($1, 0), null:INTEGER, $0)], $f1=[$1])", | ||
| "\n LogicalAggregate(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)])", | ||
| "\n LogicalExchange(distribution=[hash])", | ||
| "\n LogicalAggregate(group=[{}], agg#0=[$SUM0($1)], agg#1=[COUNT()])", | ||
| "\n LogicalProject(col2=[$1], col3=[$2])", | ||
| "\n LogicalFilter(condition=[AND(>=($2, 0), =($1, 'pink floyd'))])", | ||
| "\n LogicalTableScan(table=[[a]])", | ||
| "\nLogicalProject(avg=[/(CAST(CASE(=($1, 0), null:INTEGER, $0)):DOUBLE, $1)], count=[$1])", | ||
| "\n LogicalAggregate(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)])", | ||
| "\n LogicalExchange(distribution=[hash])", | ||
| "\n LogicalAggregate(group=[{}], agg#0=[$SUM0($1)], agg#1=[COUNT()])", | ||
| "\n LogicalProject(col2=[$1], col3=[$2])", | ||
| "\n LogicalFilter(condition=[AND(>=($2, 0), =($1, 'pink floyd'))])", | ||
| "\n LogicalTableScan(table=[[a]])", | ||
| "\n" | ||
| ] | ||
| }, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: by default Calcite will use depth first order for running these rules. Also it won't do a "fullRestartAfterTransformation" unless we use HepMatchOrder.TOP_DOWN or HepMatchOrder.BOTTOM_UP.
I think using depth first order without doing full restarts after transformation should be fine but would be good if someone else also chimes in. Note that the match order can be changed for only this collection (it's a HepInstruction) so it doesn't need to be a global setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now we should be good. HEP planner is used here to avoid a lengthy volcano planner that adds latency to the planning phase. as long as the plan results are determinisitc we can always change the way we configure the planner IMO