perf: Optimize hash joins with an empty build side by nuno-faria · Pull Request #16716 · apache/datafusion

nuno-faria · 2025-07-08T18:30:44Z

Which issue does this PR close?

N/A.

Rationale for this change

When executing hash joins, the build side is first built from the left relation and then the right relation is joined with it. However, when the build side has no rows, the join operation can be mostly skipped, improving performance.

For example, here is a simple anti join query, where t1 has 100M rows and t2 has none:

SELECT *
FROM t1
LEFT ANTI JOIN t2 on t1.k = t2.k

Here is the hash join operator in the current implementation:

HashJoinExec: mode=Partitioned, join_type=RightAnti, on=[(k@0, k@0)], metrics=[
    output_rows=100000000,
    build_input_batches=0,
    build_input_rows=0,
    input_batches=11733,
    input_rows=100000000,
    output_batches=23403,
    build_mem_used=876,
    build_time=2.8693ms,
    join_time=216.251396302s
]

And here is the optimized hash join operation:

HashJoinExec: mode=Partitioned, join_type=RightAnti, on=[(k@0, k@0)], metrics=[
    output_rows=100000000,
    build_input_batches=0,
    build_input_rows=0,
    input_batches=11733,
    input_rows=100000000,
    output_batches=11733,
    build_mem_used=876,
    build_time=2.4597ms,
    join_time=36.038306ms
]

The total join time went from 216s to just 36ms.

What changes are included in this PR?

Changed process_probe_batch in physical-plan/hash_join.rs to optimized the join.
Added multiple sqllogictests.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

Dandandan

Looks nice to me Maybe we should have some more tests for correctness of results.

jonathanc-n

Thank you @nuno-faria!

xudong963 · 2025-07-09T08:35:56Z

datafusion/physical-plan/src/joins/utils.rs

+    match join_type {
+        // these join types only return data if the left side is not empty, so we return an
+        // empty RecordBatch
+        JoinType::Inner


LGTM, how about cross join

Cross joins with an empty relation already appear to run well in the CrossJoinExec operator.

Here is the CrossJoinExec operator for SELECT * FROM t1, t2, where t1 has 100M rows and t2 has none:

CrossJoinExec, metrics=[ output_rows=0, elapsed_compute=351.714µs, build_input_batches=0, build_input_rows=0, input_batches=0, input_rows=0, output_batches=0, build_mem_used=0, build_time=351.7µs, join_time=12ns ]

Yes this makes sense, cross join is not a join type that would go through creating hash table

Thinking about this, I think a more generic version of this would be switching small left sides (e.g < 10 rows) to using cross join 🤔

Thinking about this, I think a more generic version of this would be switching small left sides (e.g < 10 rows) to using cross join 🤔

Is this including for equijoin conditions? I think the performance seemed slow when there was a larger right table for doing this with nested loop join which follows a similar algorithm. It is probably a memory issue due to the cartesian product.

I think it should be relatively fast to do a cross join / NLJ instead of a hash join for those cases, but of course depends how the nested loop join is implemented, probably there is more room for optimization of the nested loop join.

I was thinking of opening a proposal to make nested loop join faster, there are definitely some issues to work on there. I'll try to get to that when I have the time

nuno-faria · 2025-07-09T11:14:36Z

@Dandandan I've added one more test where both tables are empty. Do you have suggestions for more?

nuno-faria · 2025-07-09T11:18:53Z

@jonathanc-n Since after #16434 the hash map is not directly accessible, I've added an is_empty method to JoinHashMapType. Please check if this is the preferred approach.

Dandandan · 2025-07-09T12:14:05Z

datafusion/physical-plan/src/joins/hash_join.rs


        let timer = self.join_metrics.join_time.timer();

+        // if the left side is empty, we can skip the (potentially expensive) join operation


If we would check the left side being empty before retrieving probe batches, we could also remove hash repartition 🤔

I think we can do this in a follow up pr wdyt @nuno-faria?

I think so. Can you point out where the probe repartition is being triggered? In the process_probe_batch itself I think we can also skip creating the hashes when the build side is empty, but I measured and it didn't have a relatively big impact on performance.

jonathanc-n · 2025-07-09T12:33:33Z

@jonathanc-n Since after #16434 the hash map is not directly accessible, I've added an is_empty method to JoinHashMapType. Please check if this is the preferred approach.

Yes this looks good

jonathanc-n · 2025-07-09T12:51:20Z

@nuno-faria We can return early from collect_left_input after intaking batches and checking the number of batches

if batches.len() == 0 {
        return Ok(JoinLeftData::new(
            Box::new(JoinHashMapU32::with_capacity(0)),
            RecordBatch::new_empty(schema),
            Vec::new(),
            Mutex::new(BooleanBufferBuilder::new(0)),
            AtomicUsize::new(probe_threads_count),
            reservation,
        ));
    };

alamb · 2025-07-14T18:36:33Z

Looks like this PR was good to go and had no outstanding todos so I merged it in

perf: Optimize hash joins with an empty build side

13be88a

github-actions bot added sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Jul 8, 2025

Dandandan approved these changes Jul 8, 2025

View reviewed changes

jonathanc-n approved these changes Jul 8, 2025

View reviewed changes

Merge branch 'main' into optimize_empty_hashjoins

7498e75

xudong963 added the performance Make DataFusion faster label Jul 9, 2025

xudong963 reviewed Jul 9, 2025

View reviewed changes

nuno-faria added 2 commits July 9, 2025 12:05

Fix is_empty check

2742fcd

Add 'join both sides empty' logic test

00b5425

Dandandan reviewed Jul 9, 2025

View reviewed changes

nuno-faria added 2 commits July 9, 2025 19:49

Ensure equal_rows_arr can handle empty arrays

0e14e5f

Remove unused imports

d000bb1

jonathanc-n mentioned this pull request Jul 13, 2025

fix: Deduplicate collect_left_input physical expression evaluation #16727

Closed

alamb merged commit 04b006c into apache:main Jul 14, 2025
27 checks passed

nuno-faria deleted the optimize_empty_hashjoins branch July 14, 2025 18:37


		let timer = self.join_metrics.join_time.timer();

		// if the left side is empty, we can skip the (potentially expensive) join operation

Conversation

nuno-faria commented Jul 8, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Dandandan left a comment

Choose a reason for hiding this comment

Uh oh!

jonathanc-n left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nuno-faria commented Jul 9, 2025

Uh oh!

nuno-faria commented Jul 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathanc-n commented Jul 9, 2025

Uh oh!

jonathanc-n commented Jul 9, 2025

Uh oh!

Uh oh!

alamb commented Jul 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants