-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer) #31931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks @linzebing for working on this. Can you add |
I am looking this, also cc @cloud-fan and @maropu for review if you have time, thanks. |
Also the JIRA should be |
One unit test in SQLMetricsSuite failed. Let me check. |
ok to test |
I'll check it later, too. Thanks for your work, @linzebing ! |
e7f119b
to
08e47d2
Compare
Fixed test. |
|
||
s""" | ||
|boolean $foundMatch = false; | ||
|for (int $arrayIndex = 0; $arrayIndex < $buildRowArrayTerm.length; $arrayIndex++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the broadcast side to be empty? It seems not right here because we still need to output one row for streamed side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. Let me address this.
.join(df3, $"k1" <= $"k3", "left_outer") | ||
hasJoinInCodegen = twoJoinsDF.queryExecution.executedPlan.collect { | ||
case WholeStageCodegenExec(BroadcastNestedLoopJoinExec( | ||
_: BroadcastNestedLoopJoinExec, _, _, _, _)) => true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: seems indentation is off.
|
||
Seq(true, false).foreach { codegenEnabled => | ||
withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> codegenEnabled.toString) { | ||
// test left outer join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about adding an extra test case for broadcast side being empty?
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #136367 has finished for PR 31931 at commit
|
Test build #136374 has finished for PR 31931 at commit
|
thanks, merging to master! |
late lgtm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Late LGTM too.
What changes were proposed in this pull request?
This PR is to add code-gen support for left outer (build right) and right outer (build left). Reference:
BroadcastNestedLoopJoinExec.codegenInner()
andBroadcastNestedLoopJoinExec.outerJoin()
Why are the changes needed?
Improve query CPU performance.
Tested with a simple query:
Seeing 2x run time improvement:
Does this PR introduce any user-facing change?
No
How was this patch tested?
Changed existing unit tests in
OuterJoinSuite
to cover codegen use cases.Added unit test in WholeStageCodegenSuite.scala to make sure code-gen for broadcast nested loop join is taking effect, and test for multiple join case as well.
Example query:
Example generated code (
bnlj_doConsume_0
method):