-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend argument types for udf return_type_from_exprs
#9522
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,7 +18,6 @@ | |
//! [`ScalarUDF`]: Scalar User Defined Functions | ||
|
||
use crate::simplify::{ExprSimplifyResult, SimplifyInfo}; | ||
use crate::ExprSchemable; | ||
use crate::{ | ||
ColumnarValue, Expr, FuncMonotonicity, ReturnTypeFunction, | ||
ScalarFunctionImplementation, Signature, | ||
|
@@ -157,9 +156,12 @@ impl ScalarUDF { | |
&self, | ||
args: &[Expr], | ||
schema: &dyn ExprSchema, | ||
arg_types: &[DataType], | ||
) -> Result<DataType> { | ||
// we always pre-compute the argument types before called, so arg_types can be ensured to be non-empty | ||
assert!(!arg_types.is_empty()); | ||
// If the implementation provides a return_type_from_exprs, use it | ||
self.inner.return_type_from_exprs(args, schema) | ||
self.inner.return_type_from_exprs(args, schema, arg_types) | ||
} | ||
|
||
/// Do the function rewrite | ||
|
@@ -305,14 +307,12 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { | |
/// value for `('foo' | 'bar')` as it does for ('foobar'). | ||
fn return_type_from_exprs( | ||
&self, | ||
args: &[Expr], | ||
schema: &dyn ExprSchema, | ||
_args: &[Expr], | ||
_schema: &dyn ExprSchema, | ||
arg_types: &[DataType], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I reviewed the documentation of this API, and I think in general it is clear and no changes are needed |
||
) -> Result<DataType> { | ||
let arg_types = args | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems types are always pre-computed, we don't need to check emptiness |
||
.iter() | ||
.map(|arg| arg.get_type(schema)) | ||
.collect::<Result<Vec<_>>>()?; | ||
self.return_type(&arg_types) | ||
assert!(!arg_types.is_empty()); | ||
self.return_type(arg_types) | ||
} | ||
|
||
/// Invoke the function on `args`, returning the appropriate result | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,9 +17,10 @@ | |
|
||
//! UDF support | ||
use crate::{PhysicalExpr, ScalarFunctionExpr}; | ||
use arrow_schema::DataType; | ||
use datafusion_common::Result; | ||
use arrow_schema::Schema; | ||
use datafusion_common::{DFSchema, Result}; | ||
pub use datafusion_expr::ScalarUDF; | ||
use datafusion_expr::{type_coercion::functions::data_types, Expr}; | ||
use std::sync::Arc; | ||
|
||
/// Create a physical expression of the UDF. | ||
|
@@ -28,8 +29,22 @@ use std::sync::Arc; | |
pub fn create_physical_expr( | ||
fun: &ScalarUDF, | ||
input_phy_exprs: &[Arc<dyn PhysicalExpr>], | ||
return_type: DataType, | ||
input_schema: &Schema, | ||
args: &[Expr], | ||
input_dfschema: &DFSchema, | ||
) -> Result<Arc<dyn PhysicalExpr>> { | ||
let input_expr_types = input_phy_exprs | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we need to check the signature for udf too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the idea was at this point the type checking had already been done, so there is no reason to check the types again. However, I looked at how non udf functions work, and they also seem to check the data types again, so the change in this PR is consistent: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mean they are checked in logical expr once, so we dont need to double check them in physical expr? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we still need to check signature if someone create physcial expr directly (might be possible?). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let me merge this first and move on #9504 |
||
.iter() | ||
.map(|e| e.data_type(input_schema)) | ||
.collect::<Result<Vec<_>>>()?; | ||
|
||
// verify that input data types is consistent with function's `TypeSignature` | ||
data_types(&input_expr_types, fun.signature())?; | ||
|
||
// Since we have arg_types, we dont need args and schema. | ||
let return_type = | ||
fun.return_type_from_exprs(args, input_dfschema, &input_expr_types)?; | ||
|
||
Ok(Arc::new(ScalarFunctionExpr::new( | ||
fun.name(), | ||
fun.fun(), | ||
|
@@ -42,8 +57,8 @@ pub fn create_physical_expr( | |
|
||
#[cfg(test)] | ||
mod tests { | ||
use arrow_schema::DataType; | ||
use datafusion_common::Result; | ||
use arrow_schema::{DataType, Schema}; | ||
use datafusion_common::{DFSchema, Result}; | ||
use datafusion_expr::{ | ||
ColumnarValue, FuncMonotonicity, ScalarUDF, ScalarUDFImpl, Signature, Volatility, | ||
}; | ||
|
@@ -97,7 +112,9 @@ mod tests { | |
// create and register the udf | ||
let udf = ScalarUDF::from(TestScalarUDF::new()); | ||
|
||
let p_expr = create_physical_expr(&udf, &[], DataType::Float64)?; | ||
let e = crate::expressions::lit(1.1); | ||
let p_expr = | ||
create_physical_expr(&udf, &[e], &Schema::empty(), &[], &DFSchema::empty())?; | ||
|
||
assert_eq!( | ||
p_expr | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems unused 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's right for usage, for
return_type_from_exprs
user would calculate the return_type formExpr
andSchema
. And for a user definedreturn_type_from_exprs
, it would not re-calculate thearg_data_types
fromargs
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if
_arg_data_types
is not provided, we calculate them fromargs
andschema
. They are mutual exclusive.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(note this is a test)