-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Concatenate inside hash repartition #16223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FYI @alamb this relates to your quest to remove |
🤖 |
🤖: Benchmark completed Details
|
hmm interesting. this shows something different. |
One commit was missing, but not sure that explains the difference between my result and this one. |
. let me try some other approach later - buffering inputs for each output partition until it reaches the target batch size (just like coalescebatches). perhaps the extra copy for smaller sized batches or increased size might be hurting in some cases. |
I got some amazing results (5-20% on total average on benchmarks) on the latter approach yesterday (buffer inside repartition). Will clean it up later this week (currently ill). |
I hope you feel better ! |
Which issue does this PR close?
Rationale for this change
Recently, I found
interleave_batches
to be faster than the existing code.That actually doesn't have anything to do with
interleave
being faster (in fact, it is slower), but the fact that we don't send num_partition batches per input batch to the output channels.It takes individual batches and sends them to the output channels (and directly blocking progress as the batches have been sent upstream and may all be quickly "non-empty").
We can fix this by internally concatenating the input arrays inside RepartitionExec.
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?