Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Support null safe equal in joins #3161

Merged

Conversation

advancedxy
Copy link
Contributor

@advancedxy advancedxy commented Nov 1, 2024

This commit consists of the following parts:

  1. Make the logical plan join's on condition null safe aware
  2. Support translating null safe equal joins into physical plans
  3. Make sure the optimization rules: eliminate cross join and push down filter don't break
  4. Modifications to physical join ops to support null safe equal: hash and broadcast joins are supported. SMJ is not supported yet.
  5. Glue code in Python side to make the whole pipeline work
  6. Some UTs in the python side

Fixes: #3069

TODOs(in follow-up PRs):

  • rewrite null safe equal for SMJ so that SMJ could support null safe equals as well
  • SQL supports null safe equal, a.k.a SpaceShip(a<=>b)
  • Python's DataFrame API supports null safe equal join

This commit consists of the following parts:
1. Make the logical plan join's on condition null safe aware
2. Support translating null safe equal joins into physical plans
3. Make sure the optimization rules: eliminate cross join and push down filter
   don't break
4. Modifications to physical join ops to support null safe equal: hash and broadcast
   joins are supported. SMJ is not supported yet.
5. Glue code in Python side to make the whole pipeline work
6. Some UTs in the python side

TODOs(in follow-up PRs):
[ ] rewrite null safe equal for SMJ so that SMJ could support null safe equals as well
[ ] SQL supports null safe equal, a.k.a SpaceShip(a<=>b)
[ ] Python's DataFrame API supports null safe equal join
@github-actions github-actions bot added the enhancement New feature or request label Nov 1, 2024
Copy link

codspeed-hq bot commented Nov 1, 2024

CodSpeed Performance Report

Merging #3161 will not alter performance

Comparing advancedxy:join_support_null_eqaul_safe (4e7489c) with main (96c538b)

Summary

✅ 17 untouched benchmarks

@jaychia jaychia requested a review from kevinzwang November 1, 2024 18:24
@kevinzwang kevinzwang requested review from andrewgazelka and universalmind303 and removed request for kevinzwang November 2, 2024 01:08
@@ -82,11 +82,17 @@ impl MicroPartition {
right: &Self,
left_on: &[ExprRef],
right_on: &[ExprRef],
null_equals_nulls: Option<Vec<bool>>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether this is the right type definition or not, do we need to define it as something like Option<&[bool]?

@@ -506,13 +510,15 @@ pub(super) fn translate_single_logical_node(
// TODO(Clark): Also do a sort-merge join if a downstream op needs the table to be sorted on the join key.
// TODO(Clark): Look into defaulting to sort-merge join over hash join under more input partitioning setups.
// TODO(Kevin): Support sort-merge join for other types of joins.
// TODO(advancedxy): Rewrite null safe equals to support SMJ
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinzwang we might still need the zero_lit expression to support rewriting a <=> b to nvl + is_null. It would be much easier to support SMJ code.

P.S: I didn't include that part, we can discuss further on supporting null equal support for SMJ's PR.

@advancedxy
Copy link
Contributor Author

advancedxy commented Nov 4, 2024

Hmm, just noticed the local execution mode should be updated too: https://github.com/Eventual-Inc/Daft/actions/runs/11659892930/job/32461195259 (in the intersection API support). Let me add that and update this PR later.

@universalmind303
Copy link
Contributor

@advancedxy is this PR good to merge?

@advancedxy
Copy link
Contributor Author

@advancedxy is this PR good to merge?

I would like to make the local native execution works too in this PR. Let me add that first thing in the morning tomorrow(in beijing time).

@advancedxy
Copy link
Contributor Author

@advancedxy is this PR good to merge?

I would like to make the local native execution works too in this PR. Let me add that first thing in the morning tomorrow(in beijing time).

Updated, @universalmind303. Once the CI passes, I think it's good to merge. Some more tests might be needed, I can add that in the follow-up or related PRs.

@universalmind303 universalmind303 merged commit 594b5e2 into Eventual-Inc:main Nov 5, 2024
38 checks passed
universalmind303 pushed a commit that referenced this pull request Nov 6, 2024
One follow-up of #3161: add `on a <=> b` support for SQL's Join
operation.
universalmind303 pushed a commit that referenced this pull request Nov 12, 2024
This commit leverages null safe equal support in joins(see #3069 and
#3161) to support intersect API.

Partially fixes #3122.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

null safe equal support in joins
2 participants