-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13751] [SQL] generate better code for Filter #11585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #52686 has finished for PR 11585 at commit
|
|
@sameeragarwal how does this relate to what you are working on? |
|
@nongli @davies I think part of the rationale for having #11511 was to avoid having to extract the operator's (and its child's) constraints during the physical planning/ codegen phase. Could we not just rewrite the code generated by filter such that all |
|
Also @davies, if you prefer your current approach, we then need to propagate the constraints in the physical plan as well (currently they are only propagated in the logical plan). |
|
After offline discussion with @sameeragarwal , we agreed that 1) the constraints should only used in logical plan 2) we should use the I will update this PR based on these, and revert #11511 |
|
Test build #52757 has finished for PR 11585 at commit
|
|
Test build #52780 has finished for PR 11585 at commit
|
| """.stripMargin | ||
| }.mkString("\n") | ||
|
|
||
| // Reset the isNull to false for the not-null columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment why
This reverts commit b23c3e3.
|
Test build #52858 has finished for PR 11585 at commit
|
|
LGTM |
|
Merged into master |
## What changes were proposed in this pull request? This PR improve the codegen of Filter by: 1. filter out the rows early if it have null value in it that will cause the condition result in null or false. After this, we could simplify the condition, because the input are not nullable anymore. 2. Split the condition as conjunctive predicates, then check them one by one. Here is a piece of generated code for Filter in TPCDS Q55: ```java /* 109 */ /*** CONSUME: Filter ((((isnotnull(d_moy#149) && isnotnull(d_year#147)) && (d_moy#149 = 11)) && (d_year#147 = 1999)) && isnotnull(d_date_sk#141)) */ /* 110 */ /* input[0, int] */ /* 111 */ boolean project_isNull2 = rdd_row.isNullAt(0); /* 112 */ int project_value2 = project_isNull2 ? -1 : (rdd_row.getInt(0)); /* 113 */ /* input[1, int] */ /* 114 */ boolean project_isNull3 = rdd_row.isNullAt(1); /* 115 */ int project_value3 = project_isNull3 ? -1 : (rdd_row.getInt(1)); /* 116 */ /* input[2, int] */ /* 117 */ boolean project_isNull4 = rdd_row.isNullAt(2); /* 118 */ int project_value4 = project_isNull4 ? -1 : (rdd_row.getInt(2)); /* 119 */ /* 120 */ if (project_isNull3) continue; /* 121 */ if (project_isNull4) continue; /* 122 */ if (project_isNull2) continue; /* 123 */ /* 124 */ /* (input[1, int] = 11) */ /* 125 */ boolean filter_value6 = false; /* 126 */ filter_value6 = project_value3 == 11; /* 127 */ if (!filter_value6) continue; /* 128 */ /* 129 */ /* (input[2, int] = 1999) */ /* 130 */ boolean filter_value9 = false; /* 131 */ filter_value9 = project_value4 == 1999; /* 132 */ if (!filter_value9) continue; /* 133 */ /* 134 */ filter_metricValue1.add(1); /* 135 */ /* 136 */ /*** CONSUME: Project [d_date_sk#141] */ /* 137 */ /* 138 */ project_rowWriter1.write(0, project_value2); /* 139 */ append(project_result1.copy()); ``` ## How was this patch tested? Existing tests. Author: Davies Liu <davies@databricks.com> Closes apache#11585 from davies/gen_filter.
What changes were proposed in this pull request?
This PR improve the codegen of Filter by:
Here is a piece of generated code for Filter in TPCDS Q55:
How was this patch tested?
Existing tests.