fix: Deduplicate collect_left_input physical expression evaluation#16727
fix: Deduplicate collect_left_input physical expression evaluation#16727jonathanc-n wants to merge 6 commits intoapache:mainfrom
collect_left_input physical expression evaluation#16727Conversation
|
@alamb Would you be able to run some benchmarks on this please? |
| }) | ||
| .await?; | ||
|
|
||
| if batches.len() == 0 { |
There was a problem hiding this comment.
This will cause failures on SymmetricHashJoin as it will compare empty arrays. @nuno-faria If you have the time, would you like to work on this?
There was a problem hiding this comment.
Yeah I can take a look. Could you provide an example that uses a SymmetricHashJoin?
There was a problem hiding this comment.
If you add this code:
if batches.len() == 0 {
return Ok(JoinLeftData::new(
Box::new(JoinHashMapU32::with_capacity(0)),
RecordBatch::new_empty(schema),
Vec::new(),
Mutex::new(BooleanBufferBuilder::new(0)),
AtomicUsize::new(probe_threads_count),
reservation,
));
};and run sqllogictests it should direct you to the queries
There was a problem hiding this comment.
I think I fixed it by updating the equal_rows_arr function. What would be the easiest way to contribute, since I don't have push permissions to jonathanc-n/datafusion?
There was a problem hiding this comment.
I think you can just open a pul request from another branch, or you could include it in your other pull request that you have open, and I can update this branch after it merges
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Maybe we should do this again when tests are passing after #16716 is merged |
|
The duplicate evaluation of the keys on the build side is no big deal, because those expressions are simple column selections. This is done when lowering a logical plan to a physical plan: [0] [1] [0] datafusion/datafusion/core/src/physical_planner.rs Lines 950 to 1058 in eb25e8d [1] datafusion/datafusion/expr/src/logical_plan/builder.rs Lines 1979 to 2039 in eb25e8d |
|
@ctsk Oh i see, I guess this pull request would be to just deduplicate the code |
collect_left_input processingcollect_left_input physical expression evaluation
|
@Dandandan Would you be able to take a look when you have the time? |
|
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Which issue does this PR close?
Rationale for this change
I tried some optimizations in #16719 but there is a large slow down with concatenating arrays twice
What changes are included in this PR?
Couple of changes:
collect_left_inputupdate_hashto avoid duplicate left value calculationAre these changes tested?
Are there any user-facing changes?