-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Deprecate ScalarUDFImpl::return_type #13717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -173,7 +173,9 @@ impl ScalarUDF { | |
| /// its [`ScalarUDFImpl::return_type`] should raise an error. | ||
| /// | ||
| /// See [`ScalarUDFImpl::return_type`] for more details. | ||
| #[deprecated(since = "44.0.0", note = "Use return_type_from_exprs() instead")] | ||
| pub fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> { | ||
| #[allow(deprecated)] | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't we also deprecate this (
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes |
||
| self.inner.return_type(arg_types) | ||
| } | ||
|
|
||
|
|
@@ -450,6 +452,7 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { | |
| /// is recommended to return [`DataFusionError::Internal`]. | ||
| /// | ||
| /// [`DataFusionError::Internal`]: datafusion_common::DataFusionError::Internal | ||
| #[deprecated(since = "44.0.0", note = "Use `return_type_from_exprs` instead")] | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we are going to deprecate this API I think we should add an example to I can help if we proceed |
||
| fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>; | ||
|
|
||
| /// What [`DataType`] will be returned by this function, given the | ||
|
|
@@ -483,6 +486,7 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { | |
| _schema: &dyn ExprSchema, | ||
| arg_types: &[DataType], | ||
| ) -> Result<DataType> { | ||
| #[allow(deprecated)] | ||
| self.return_type(arg_types) | ||
| } | ||
|
|
||
|
|
@@ -756,6 +760,7 @@ impl ScalarUDFImpl for AliasedScalarUDFImpl { | |
| } | ||
|
|
||
| fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> { | ||
| #[allow(deprecated)] | ||
| self.inner.return_type(arg_types) | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @goldmedal for ideas how (if) we will be able to resolve this deprecation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminding this. I filed #13735 to trace it.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While reading #13735 and thinking about the churn we had recently with
invoke_with_batchandinvoke_with_args, etcWhat if we made a new API that could accomodate both usecases. Something like
🤔
This would also let us add other fields to the
ReturnTypeArgsstructure over time with less API churnThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we change the signature to
return_type(ReturnTypeArgs)and removereturn_type_with_argsat allI was thinking about this too but not sure this huge breaking change is acceptable or not, but I think this will be a better change in long term
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
return_type(ReturnTypeArgs)would be the best in the long termHowever, in the short term (next 6 months) I think it would be nicer to downstream crates / users to add a new function
return_type_with_argsand leave#deprecationnotices onreturn_typeandreturn_type_from_exprsdirecting people toreturn_type_with_argsThen once the deprecation period has passed we could rename / introduce a new function called
return_type🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one of the challenge of #13825 and #12604 is that the properties of
Exprincluding data type is determined based onschema. Unless theschemais the same all the way of the processing otherwise we need to recompute the properties based on givenschemaso we can't compute once and store the information withinExpr. IntroduceMapfor schema -> properties mapping seems not practically.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, if we have column
c, we needschemato know it's data type isInt32. With anotherschema, it may becomesInt64. How do we ensure the schema is not changed and we can happily reuse the result we had before?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point.
The other way to view this is -- what does an Expr represent?
is it a syntactical expression (doesn't know types, can be applied to multiple different inputs), or a semantical expression (anchored in the evaluation context, knows types).
The SQL handling process goes from syntax to semantics, and expressions (Expr) built in dataframe layer are definitely syntactical, not semantical.
This may be a challenge for #13825, but less so for #12604. If we have new IR with separate Expr types, it won't be used in contexts where we need syntactical expressions.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I review about this part again and my conclusion is that
Exprshould be the same as it is now, andschemais separated concept other thanExpr, with them we compute the corresponding metadata like data type and nullability. Therefore, I still think we needreturn_type_with_argsfor solving this issue.The issue mentioned in #12604 should be solved in another way. I think we can have such information stored in
LogicalPlaninstead.Exprhas no type info until the schema is determined and it is when we create correspondingLogicalPlanfromExprandSchema.LogicalPlancan be considered as the Container ofExpr+Schema, whenever the schema is updated orExpris rewritten, we recompute the properties ofExpr. If nothing changed, we can reuse such computed properties.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that we don't even need
args: Vec<Expr>, what we need areVec<String>.