Skip to content

Split built in functions into "packages" #7110

Open
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Right now, DataFusionhas 104 built in functions: https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.BuiltinScalarFunction.html

Screenshot 2023-07-27 at 8 51 45 AM

As we add new features and functions (most recently, date_diff #7097 (comment) or all the arary functions) this number will keep growing

This growth means that the size of the DataFusion library will keep growing even if users do not use those features

Describe the solution you'd like

Given that DataFusion has all the machinery to register user defined functions, and they are mostly handled the same way, I propose we split up the datafusion built into scalar function packages

Perhaps like

  • string_functions (upper, hex, etc)
  • crypto_functions (hash, etc)
  • array_functions (make_array, array_contains, etc)
  • date_time functions (date_trunc, etc)
  • ...

Not only would this give users better control over their binary size it would also ensure that the extensibility APIs of DataFusion were sufficient for all functions (and we could enhance the extension points if this was not the case)

This would replace the existing somewhat haphazard feature flags like crypto_functions:

https://github.com/apache/arrow-datafusion/blob/11b7b5c215012231e5768fc5be3445c0254d0169/datafusion/physical-expr/src/lib.rs#L21-L22

I would imagine these functions would be in their own crates like datafusion_functions_crypto with an entry point like

let ctx = SessionContext::new();
ctx.sql("select hash('foobar')");// would error 

// register all functions in the `datafusion_functions_crypto` package
datafusion_functions_crypto::register(&ctx)
ctx.sql("select hash('foobar')");// would now succeed

Describe alternatives you've considered

We could continue to use feature flags

If this works out, we could do the same thing for aggregate and window functions

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions