-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Clean up hash_join's ExecutionPlan::execute #15418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think similar changes can be made to cross_join and nested_loop_join if desired. |
This backs out commit 4346cb8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @ctsk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ctsk. I believe we can merge this as is. However I'd like to raise one thing that comes to mind whenever I look this code. I'm not very comfortable with adding a CoalescePartitions
after calling execute()
. Modifying the plan post-execute() feels a bit off to me. Does it seem like a smell to you as well?
🤔
I think coalesce added before However just noticed the execute currently happens in main thread instead of |
This should not trigger for physical plans generated by datafusion, since the EnforceDistribution pass already adds that CoalescePartitionsExec. |
I mean HashJoin::execute() |
You mean |
I wanted to keep the PR simple and just refactor. Here is a follow-up that removes it: #15476. Let's see if the tests pass :) EnforceDistribution adds it here: datafusion/datafusion/physical-optimizer/src/enforce_distribution.rs Lines 940 to 968 in fa452e6
datafusion/datafusion/physical-optimizer/src/enforce_distribution.rs Lines 1256 to 1259 in fa452e6
This apples to CollectLeft HJs left child: datafusion/datafusion/physical-plan/src/joins/hash_join.rs Lines 705 to 710 in fa452e6
|
Closed in favor of #15476 |
Rationale for this change + What changes are included in this PR?
The logic of HashJoin's
execute
implements differs from the other operators - The recursive call to theexecute
step of the build side are delayed untilcollect_left_join
is poll'ed. This PR changes that to make it more alike to the standard implementations ofexecute
.Are these changes tested?
No changes in behaviour are expected.
Are there any user-facing changes?
No