Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port ArrayHas family to functions-array #9496

Merged
merged 10 commits into from
Mar 9, 2024

Conversation

jayzhan211
Copy link
Contributor

Which issue does this PR close?

Part of #9285

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules labels Mar 8, 2024
@jayzhan211
Copy link
Contributor Author

Contains #9477

}
Ok(Transformed::no(expr))
}
}

#[cfg(feature = "array_expressions")]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guojidan I'm thinking of importing udfs with feature flag like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if user disable array_expressions featrue, and implement array function by themselves, this analyzer will not be called 😕

Copy link
Contributor Author

@jayzhan211 jayzhan211 Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule is disabled, but other analyzer rules are not. As long as array_expressions is disabled, related function should be disabled too. Even they implement their own array functions, those functions are not equal to our udf based array functions, so it makes sense to me that the rewrite rule is disabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems like ideally that users who want to supply their own implementations for array_expressions should also supply their own Analyzer passes.

Following that logic, it seems like the datafusion_array_functions crate should supply its own analyzer pass that had array function specific rewrites.

Does that make sense @jayzhan211 and @guojidan?

If so, I will file a ticket to track the request / work.

Note that since I don't think users can actually add their own Analyzer passes today, we can't implement this idea yet. Thus I think we can still merge this PR.

I also think it would help to explain this rationale in comments. For example

Suggested change
#[cfg(feature = "array_expressions")]
// Note This rewrite is only done if the built in DataFusion `array_expressions` feature is enabled.
// Even if users implement their own array functions, those functions are not equal to the DataFusion
// udf based array functions, so this rewrite is not corrrect
#[cfg(feature = "array_expressions")]

Copy link
Contributor Author

@jayzhan211 jayzhan211 Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems like ideally that users who want to supply their own implementations for array_expressions should also supply their own Analyzer passes.

Following that logic, it seems like the datafusion_array_functions crate should supply its own analyzer pass that had array function specific rewrites.

Does that make sense @jayzhan211 and @guojidan?

If so, I will file a ticket to track the request / work.

Note that since I don't think users can actually add their own Analyzer passes today, we can't implement this idea yet. Thus I think we can still merge this PR.

I also think it would help to explain this rationale in comments. For example

I'm not sure if there are any concrete cases, but if they can have an analyzer that not only deals with array_expressions but also other expressions that would be more flexible.

However, the first step is to build an user-defined analyzer rule and find a way to register them to existing rules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #9519

(it turns out that you can already define AnalyzerRules). I will make a PR to improve the documentation around this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR to improve documentation: #9520

@jayzhan211 jayzhan211 marked this pull request as ready for review March 8, 2024 06:30
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- I think we could merge this PR in and then file tickets for moving the array_functions specific analyzer rules into the datafusion_functions_array crate as well (see comments)

If you agree I will file the tickets

Thanks again

}
Ok(Transformed::no(expr))
}
}

#[cfg(feature = "array_expressions")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems like ideally that users who want to supply their own implementations for array_expressions should also supply their own Analyzer passes.

Following that logic, it seems like the datafusion_array_functions crate should supply its own analyzer pass that had array function specific rewrites.

Does that make sense @jayzhan211 and @guojidan?

If so, I will file a ticket to track the request / work.

Note that since I don't think users can actually add their own Analyzer passes today, we can't implement this idea yet. Thus I think we can still merge this PR.

I also think it would help to explain this rationale in comments. For example

Suggested change
#[cfg(feature = "array_expressions")]
// Note This rewrite is only done if the built in DataFusion `array_expressions` feature is enabled.
// Even if users implement their own array functions, those functions are not equal to the DataFusion
// udf based array functions, so this rewrite is not corrrect
#[cfg(feature = "array_expressions")]

jayzhan211 and others added 2 commits March 9, 2024 07:55
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants