-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Support null safe equal in joins #3161
[FEAT] Support null safe equal in joins #3161
Conversation
This commit consists of the following parts: 1. Make the logical plan join's on condition null safe aware 2. Support translating null safe equal joins into physical plans 3. Make sure the optimization rules: eliminate cross join and push down filter don't break 4. Modifications to physical join ops to support null safe equal: hash and broadcast joins are supported. SMJ is not supported yet. 5. Glue code in Python side to make the whole pipeline work 6. Some UTs in the python side TODOs(in follow-up PRs): [ ] rewrite null safe equal for SMJ so that SMJ could support null safe equals as well [ ] SQL supports null safe equal, a.k.a SpaceShip(a<=>b) [ ] Python's DataFrame API supports null safe equal join
CodSpeed Performance ReportMerging #3161 will not alter performanceComparing Summary
|
@@ -82,11 +82,17 @@ impl MicroPartition { | |||
right: &Self, | |||
left_on: &[ExprRef], | |||
right_on: &[ExprRef], | |||
null_equals_nulls: Option<Vec<bool>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether this is the right type definition or not, do we need to define it as something like Option<&[bool]
?
@@ -506,13 +510,15 @@ pub(super) fn translate_single_logical_node( | |||
// TODO(Clark): Also do a sort-merge join if a downstream op needs the table to be sorted on the join key. | |||
// TODO(Clark): Look into defaulting to sort-merge join over hash join under more input partitioning setups. | |||
// TODO(Kevin): Support sort-merge join for other types of joins. | |||
// TODO(advancedxy): Rewrite null safe equals to support SMJ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinzwang we might still need the zero_lit
expression to support rewriting a <=> b
to nvl + is_null
. It would be much easier to support SMJ code.
P.S: I didn't include that part, we can discuss further on supporting null equal support for SMJ's PR.
Hmm, just noticed the local execution mode should be updated too: https://github.com/Eventual-Inc/Daft/actions/runs/11659892930/job/32461195259 (in the intersection API support). Let me add that and update this PR later. |
@advancedxy is this PR good to merge? |
I would like to make the local native execution works too in this PR. Let me add that first thing in the morning tomorrow(in beijing time). |
Updated, @universalmind303. Once the CI passes, I think it's good to merge. Some more tests might be needed, I can add that in the follow-up or related PRs. |
One follow-up of #3161: add `on a <=> b` support for SQL's Join operation.
This commit consists of the following parts:
Fixes: #3069
TODOs(in follow-up PRs):