Skip to content

Support Expr creation for ScalarUDF: Resolve function calls by name during planning #8157

Closed
@2010YOUY01

Description

@2010YOUY01

Is your feature request related to a problem or challenge?

Motivation

There is ongoing work migrating BuitlinScalarFunctions -> ScalarUDF #8045, and we noticed one Expr related issue during prototyping:

We can use Expr API to create builtin scalar functions directly:

let expr1 = abs(lit(-1));
let expr2 = call_fn("abs", vec![lit(-1)]);

We want to still use this API after functions are migrated to ScalarUDF based implementation, it's not possible now because those Expr creations are stateless, and ScalarUDFs are registered inside SessionState. It's only possible if we do

let ctx = create_context()?;
ctx.register_udf(abs_impl);
let expr2 = ctx.call_fn("abs", vec![lit(-1)]);

Describe the solution you'd like

To continue supporting the original stateless Expr creation API, we can create Expr with only the string function name, and resolve the function during logical optimization.
Then call_fn("abs", vec![lit(-1)]) can be supported (now it only support BuitlinScalarFunction and don't support UDFs)
Another macro-based API abs(lit(-1)) can be supported if we hard code all possible function names within the core (should we do that?)

Potential implementation is:

  1. After let expr2 = call_fn("abs", vec![lit(-1)]);, create a ScalarUDF expression with dummy implementation.
  2. Add an AnalyzerRule(a mandatory logical optimizer rule) to resolve this UDF name using external functions registered in SessionState

Issue:

Now function implementation is inside SessionState but outside SessionConfig, and the analyzer can only access SessionConfig.
We have to move the function registry into SessionConfig first (or is there any better way?)

pub struct SessionState {
    // ...
    /// Scalar functions that are registered with the context
    scalar_functions: HashMap<String, Arc<ScalarUDF>>,
    /// Session configuration
    config: SessionConfig,
// ...
// `self.options()` is `ConfigOption`
            let analyzed_plan =
                self.analyzer
                    .execute_and_check(plan, self.options(), |_, _| {})?;

cc @alamb I wonder if you have any thoughts on this approach 🤔

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions