-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Is your feature request related to a problem or challenge?
File an issue to discuss about design in #10193, since it is no longer a minor change
Previously, we always provided a null array if the function supports zero args, like random(), pi(), make_array()
.
I think the additional null array should not provided for all the function that supports 0 args, instead handle case by case in each function. It turns out the null array is not useful either. random
, pi
, and uuid
takes the number of rows instead of the actual null array.
We need to design an alternative way to communicate the number of rows to the function.
Proposal 1:
Add support_randomness -> bool
to ScalarUDFImpl
, and we provide the number of rows as the first argument for invoke
.
Proposal 2:
We always provide the number of rows to invoke
.
Change from
fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue>;
to
/// batch_rows is the number of rows in each batch
fn invoke(&self, args: &[ColumnarValue], batch_rows: usize) -> Result<ColumnarValue>;
fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
let inputs = self
.args
.iter()
.map(|e| e.evaluate(batch))
.collect::<Result<Vec<_>>>()?;
// evaluate the function
match self.fun {
ScalarFunctionDefinition::UDF(ref fun) => fun.invoke(&inputs, batch.num_rows()),
ScalarFunctionDefinition::Name(_) => {
internal_err!(
"Name function must be resolved to one of the other variants prior to physical planning"
)
}
}
}
It is more aggressive and breaks the signature, but batch
is part of the evaluate()
, provide information about batch
and args
to invoke()
makes sense to me.
Proposol 3 from alamb
Introduce another function with batch rows only.
fn invoke_no_args(number_rows: usize) -> {
not_yet_impl_err!
}
We need support_randomness -> bool
, so we know to switch to invoke_no_args
.
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
I decide to go for 3, since there are already over 100 invoke()
used, not worth to break the signature