Skip to content

Add a ScalarUDFImpl::simplfy() API #9289

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

As part of porting arrow_cast #9287 and make_array #9288, it has become clear that some functions have special simplification semantics:

  1. arrow_cast simplifies to cast of a particular data type. It is important for the rest of datafusion to understand the Cast semantics as the there are several special cases for Expr::Cast (such as unwrap_cast_in_comparison)
  2. make_array has special rewrite rules to combine / fold with array_append and array_prepaend in the simplifier (source link)

Also I think some systems may want to provide the ability to define UDFs that are more like macros (can be expressed in terms of other built in expressions), which needs some way for datafusion to "inline" the definition

Similarly, specialized functions (e.g replace regexp_match with a version that had the regexp pre-compiled ...) - #8051 sound very similar

Describe the solution you'd like

I propose adding a function such as the following, we could implement the simplifications for make_array and arrow_cast in the UDF (and not in the main simplify code):

/// Was the expression simplified?
enum Simplified {
  /// The function call was simplified to an entirely new Expr
  Rewritten(Expr),
  /// the function call could not be simplified, and the arguments
  /// are return unmodified
  Original(Vec<Expr>)
}
pub trait ScalarUDFImpl {
...

  /// Apply any function specific simplification rules
  ///
  /// Some functions like arrow_cast have special semantic simplifications
  /// (into `Expr::Cast` for example) that can improve planning.
  ///
  /// If there is a simpler representation of calling this function with the
  /// specified arguments, return `Simplified::Rewritten` with the simplification.
  /// If no such simplification was possible, returns `Simplified::Original` with
  /// the unmodified arguments (the default)
  ///
  /// This function should only apply simplifications specific to this function.
  /// DataFusion will automatically simplify the arguments with a variety
  /// of rewrites during optimization
  fn simplify(&self, args: Vec<Expr>) -> Result<Simplified> {
    Ok(Simplified::Original(args)
  }
  ...

}

Describe alternatives you've considered

There may be better

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions