Skip to content

NestedLoopJoin Projection Pushdown#14120

Merged
berkaysynnada merged 15 commits intoapache:mainfrom
synnada-ai:nlj-proj-pushdown
Jan 16, 2025
Merged

NestedLoopJoin Projection Pushdown#14120
berkaysynnada merged 15 commits intoapache:mainfrom
synnada-ai:nlj-proj-pushdown

Conversation

@jayzhan-synnada
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

Similar to HashJoin, we also need projection pushdown for nested loop join

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
@github-actions github-actions bot added physical-expr Changes to the physical-expr crates core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) proto Related to proto crate labels Jan 14, 2025
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
@jayzhan-synnada jayzhan-synnada marked this pull request as draft January 14, 2025 08:12
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
@jayzhan-synnada jayzhan-synnada marked this pull request as ready for review January 14, 2025 10:06
@github-actions github-actions bot added the optimizer Optimizer rules label Jan 15, 2025
Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jayzhan-synnada. This PR clearly brings an improvement, but I have a suggestion to further optimize projections. It will be even better if we handle mentioned cases as well.

try_swapping_with_cross_join(projection, cross_join)?
} else if let Some(nl_join) = input.downcast_ref::<NestedLoopJoinExec>() {
try_swapping_with_nested_loop_join(projection, nl_join)?
try_pushdown_through_nested_loop_join(projection, nl_join)?.map_or_else(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the same pattern exists in HashJoin part too, but does it take much effort if we somehow unify this "first try pushdown, if not possible, then embed" approach? Like "embed and pushdown", whichever possible.

indices
}

fn try_pushdown_through_nested_loop_join(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is, can we do the possible pushdown and embed operation at the same time in this function

let expected = [
"NestedLoopJoinExec: join_type=Inner, filter=a@0 < b@1, projection=[c@2]",
" CsvExec: file_groups={1 group: [[x]]}, projection=[a, b, c, d, e], has_header=false",
" CsvExec: file_groups={1 group: [[x]]}, projection=[a, b, c, d, e], has_header=false",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A solid example of what I am looking for is, these plans will project a&c (first CsvExec), and b only (second CsvExec). Embedding the projection into CsvExec is already done, we just need to pushdown the projection below NestedLoopJoinExec

@berkaysynnada
Copy link
Contributor

I'll merge this once the conflicts are resolved, and @jayzhan-synnada will create a ticket for the item that drives projection optimizations further

@berkaysynnada berkaysynnada merged commit 05f4e5a into apache:main Jan 16, 2025
25 checks passed
@berkaysynnada
Copy link
Contributor

Thanks @jayzhan-synnada

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules physical-expr Changes to the physical-expr crates proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants