-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for ScalarUDFImpl::invoke_with_return_type
where the invoke is passed the return type created for the udf instance
#13290
base: main
Are you sure you want to change the base?
Conversation
…invoke is passed the return type created for the udf instance
datafusion/expr/src/udf.rs
Outdated
&self, | ||
args: &[ColumnarValue], | ||
number_rows: usize, | ||
return_type: &DataType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we add InvokeUDFArg
struct that includes these to avoid breaking change in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should roll all the values into this struct or just number_rows and return_type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
including args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts on the new function name, if we roll these in together. Since this subsumes invoke_batch
, would we then want to deprecate that too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer single invoke
function, but that will be a large change so not required to roll them together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better name than
pub fn invoke_batch_with_return_type(
&self,
args: InvokeArgs
) -> Result<ColumnarValue> {
is needed, maybe invoke_with_args(), I don't like it that much, but not sure on the name.
pub fn invoke_with_args(
&self,
args: InvokeArgs
) -> Result<ColumnarValue> {
In an ideal worlds, UDF would have single invoke method. This is why #13238 / #13064 / #13345 consolidation takes place. In #12819 (comment) i am attempting to clarify why If not simplify, what if we had something like |
Which issue does this PR close?
Closes #12819.
Rationale for this change
There is currently no way to determine the schema-defined nullability of the arrays from
args: &[ColumnarValue],
from theinvoke_batch
function. The array nullabilityArray::is_nullable()
(which is available) only checks the array null count.If the return type (from
ScalarUDFImpl::return_type
) is passed to the newly defined invoke method then this datatype can be inspected and any nullability information can be inferred.What changes are included in this PR?
Define a new trait function
invoke_batch_with_return_type
which defaults to calling toinvoke_batch
.Example usage can be seen here.
Are these changes tested?
Are there any user-facing changes?
There are no breaking changes, but a new optional method is added.