Skip to content

[Epic] A new Scalar Function interface #7977

Closed
@2010YOUY01

Description

@2010YOUY01

Is your feature request related to a problem or challenge?

As previously discussed in #7110 #7752 : We want to define Scalar Functions outside the core to reduce Datafusion Core binary size, and also make UDF management easy.
But the current BuiltinScalarFunction and ScalarUDF interfaces are not capable to do this:

Problems with BuiltinScalarFunction

  • built-in functions are implemented with Enum BuiltinScalarFunction, and function implementations like return_type() are large methods that match every enum variant. Adding a new function requires modifications in multiple places (not easy to add functions), also we can't define scalar functions outside the core.

Problems with ScalarUDF

  • Now ScalarUDFs are represented with a struct, it can not cover all the functionalities of existing built-in functions
  • Defining a new ScalarUDF requires constructing a struct in an imperative way (see examples/simple_udf.rs), this way is not ergonomic especially when you have to manage large number of functions in a separate crate.

So we would like to introduce a new interface trait ScalarFunctionDef: it can define a function declaratively and in one place (easier than Enum BuiltinScalarFunction's adding arms for pattern matching in multiple places, and struct ScalarUDF's imperative way of defining a new function.
After introducing the new interface we can gradually migrating existing built-in functions to the new one, the old UDF interface create_udf() can keep unchanged.

Describe the solution you'd like

Objective

  1. Define a new interface trait ScalarFunctionDef to solve the above-mentioned issue
  2. Now built-in functions are using Enum BuiltinScalarFunction as the underlying execution mechanism in the core code, and scalar UDFs are using Struct ScalarUDF, both will be moving to ScalarFunctionDef as the unified internal representation.

Implementation Plan

  1. Introducing new interface ScalarFunctionDef and replace the old built-in functions' execution code
    1.1 Replace SQL execution code with ScalarFunctionDef
    1.2 Change other relevant execution components, like the Logical Expression Constructor for Scalar Functions, to be compatible with ScalarFunctionDef.
  2. Migrate old UDF Implementations: Struct ScalarUDF -> ScalarFunctionDef
  3. Function package management interface
    ...and migrating existing functions

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions