perf: Optimize hash joins with an empty build side#16716
Conversation
Dandandan
left a comment
There was a problem hiding this comment.
Looks nice to me Maybe we should have some more tests for correctness of results.
jonathanc-n
left a comment
There was a problem hiding this comment.
Thank you @nuno-faria!
| match join_type { | ||
| // these join types only return data if the left side is not empty, so we return an | ||
| // empty RecordBatch | ||
| JoinType::Inner |
There was a problem hiding this comment.
Cross joins with an empty relation already appear to run well in the CrossJoinExec operator.
Here is the CrossJoinExec operator for SELECT * FROM t1, t2, where t1 has 100M rows and t2 has none:
CrossJoinExec, metrics=[
output_rows=0,
elapsed_compute=351.714µs,
build_input_batches=0,
build_input_rows=0,
input_batches=0,
input_rows=0,
output_batches=0,
build_mem_used=0,
build_time=351.7µs,
join_time=12ns
]
There was a problem hiding this comment.
Yes this makes sense, cross join is not a join type that would go through creating hash table
There was a problem hiding this comment.
Thinking about this, I think a more generic version of this would be switching small left sides (e.g < 10 rows) to using cross join 🤔
There was a problem hiding this comment.
Thinking about this, I think a more generic version of this would be switching small left sides (e.g < 10 rows) to using cross join 🤔
Is this including for equijoin conditions? I think the performance seemed slow when there was a larger right table for doing this with nested loop join which follows a similar algorithm. It is probably a memory issue due to the cartesian product.
There was a problem hiding this comment.
I think it should be relatively fast to do a cross join / NLJ instead of a hash join for those cases, but of course depends how the nested loop join is implemented, probably there is more room for optimization of the nested loop join.
There was a problem hiding this comment.
I was thinking of opening a proposal to make nested loop join faster, there are definitely some issues to work on there. I'll try to get to that when I have the time
|
@Dandandan I've added one more test where both tables are empty. Do you have suggestions for more? |
|
@jonathanc-n Since after #16434 the hash map is not directly accessible, I've added an |
|
|
||
| let timer = self.join_metrics.join_time.timer(); | ||
|
|
||
| // if the left side is empty, we can skip the (potentially expensive) join operation |
There was a problem hiding this comment.
If we would check the left side being empty before retrieving probe batches, we could also remove hash repartition 🤔
There was a problem hiding this comment.
I think we can do this in a follow up pr wdyt @nuno-faria?
There was a problem hiding this comment.
I think so. Can you point out where the probe repartition is being triggered? In the process_probe_batch itself I think we can also skip creating the hashes when the build side is empty, but I measured and it didn't have a relatively big impact on performance.
Yes this looks good |
|
@nuno-faria We can return early from collect_left_input after intaking batches and checking the number of batches if batches.len() == 0 {
return Ok(JoinLeftData::new(
Box::new(JoinHashMapU32::with_capacity(0)),
RecordBatch::new_empty(schema),
Vec::new(),
Mutex::new(BooleanBufferBuilder::new(0)),
AtomicUsize::new(probe_threads_count),
reservation,
));
}; |
|
Looks like this PR was good to go and had no outstanding todos so I merged it in |
Which issue does this PR close?
N/A.
Rationale for this change
When executing hash joins, the build side is first built from the left relation and then the right relation is joined with it. However, when the build side has no rows, the join operation can be mostly skipped, improving performance.
For example, here is a simple anti join query, where
t1has 100M rows andt2has none:Here is the hash join operator in the current implementation:
HashJoinExec: mode=Partitioned, join_type=RightAnti, on=[(k@0, k@0)], metrics=[ output_rows=100000000, build_input_batches=0, build_input_rows=0, input_batches=11733, input_rows=100000000, output_batches=23403, build_mem_used=876, build_time=2.8693ms, join_time=216.251396302s ]And here is the optimized hash join operation:
HashJoinExec: mode=Partitioned, join_type=RightAnti, on=[(k@0, k@0)], metrics=[ output_rows=100000000, build_input_batches=0, build_input_rows=0, input_batches=11733, input_rows=100000000, output_batches=11733, build_mem_used=876, build_time=2.4597ms, join_time=36.038306ms ]The total join time went from 216s to just 36ms.
What changes are included in this PR?
process_probe_batchinphysical-plan/hash_join.rsto optimized the join.Are these changes tested?
Yes.
Are there any user-facing changes?
No.