-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prioritize UDF over scalar built-in function in case of function name… #6601
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution @epsio-banay ! I left some comments -- let me know if they don't make sense
datafusion/common/src/config.rs
Outdated
@@ -191,6 +191,8 @@ config_namespace! { | |||
/// MySQL, PostgreSQL, Hive, SQLite, Snowflake, Redshift, MsSQL, ClickHouse, BigQuery, and Ansi. | |||
pub dialect: String, default = "generic".to_string() | |||
|
|||
/// Should DataFusion prioritize UDF over built-in scalar function in case of function name conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't think a config flag is needed -- if a UDF is registered with the same name as a built in, it will not be callable (as you note in the issue / PR description). Therefore I think it is safe to assume that the if a user registers a function with the same name as a built in they want the new function to have precidence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Thanks for responding so fast, this PR got much more simple.
I'll also edit the PR description - remove the described config flag
)); | ||
|
||
// Test data | ||
let test_data = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than checking the plan here, what do you think about simply running the query and verifying the output?
For example, you could run the query
SELECT ABS(-100)
And if it returns 1
you know the UDF is called, and if it returns 100
you know the built in was called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in scalar_udf_override_built_in_scalar_function
Marking as draft as this PR has feedback and is awaiting a response (I am trying to keep the list of PRs needing review clear) -- please mark as ready for review when ready |
180a167
to
319df05
Compare
c62c77f
to
be0507d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me @epsio-banay -- thank you very much for the contribution
I think this branch has a logical merge conflict with some of the changes on master. If I applied this diff locally the tests passed: diff --git a/datafusion/core/tests/sql/udf.rs b/datafusion/core/tests/sql/udf.rs
index 0ecd5d0fd..eb76c3417 100644
--- a/datafusion/core/tests/sql/udf.rs
+++ b/datafusion/core/tests/sql/udf.rs
@@ -185,7 +185,7 @@ async fn scalar_udf_override_built_in_scalar_function() -> Result<()> {
let batch = RecordBatch::try_new(
Arc::new(schema.clone()),
- vec![Arc::new(Int32Array::from(vec![-100]))],
+ vec![Arc::new(Int32Array::from_slice([-100]))],
)?;
let ctx = SessionContext::new();
|
be0507d
to
5f325bd
Compare
rebased and fixed:) |
Thanks again @epsio-banay |
Which issue does this PR close?
Closes #6588.
Rationale for this change
This PR lets a user override the built-in implementation of scalar functions (such as abs, floor, etc) using the ScalarUdf feature.
We are currently only using the logical planner of Datafusion, this feature would allow us and others to have more control over the built-in scalar functions.
It should also be inline with Datafusion's intention to be customizable.
What changes are included in this PR?
Scalar UDFs will have precedence over Scalar built-in functions in case of name conflicts.
For example if a user registers a UDF with the name "abs" (as the built-in function called "abs"), The sql "SELECT abs(10)" will use the UDF instead of the built-in function.
Are these changes tested?
Yes
Are there any user-facing changes?
Yes.
DataFusion will prioritize UDF over built-in scalar function in case of function name conflict.