Description
Is your feature request related to a problem or challenge?
Right now, DataFusionhas 104 built in functions: https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.BuiltinScalarFunction.html

As we add new features and functions (most recently, date_diff
#7097 (comment) or all the arary functions) this number will keep growing
This growth means that the size of the DataFusion library will keep growing even if users do not use those features
Describe the solution you'd like
Given that DataFusion has all the machinery to register user defined functions, and they are mostly handled the same way, I propose we split up the datafusion built into scalar function packages
Perhaps like
string_functions
(upper
,hex
, etc)crypto_functions
(hash
, etc)array_functions
(make_array
,array_contains
, etc)- date_time functions (
date_trunc
, etc) - ...
Not only would this give users better control over their binary size it would also ensure that the extensibility APIs of DataFusion were sufficient for all functions (and we could enhance the extension points if this was not the case)
This would replace the existing somewhat haphazard feature flags like crypto_functions
:
I would imagine these functions would be in their own crates like datafusion_functions_crypto
with an entry point like
let ctx = SessionContext::new();
ctx.sql("select hash('foobar')");// would error
// register all functions in the `datafusion_functions_crypto` package
datafusion_functions_crypto::register(&ctx)
ctx.sql("select hash('foobar')");// would now succeed
Describe alternatives you've considered
We could continue to use feature flags
If this works out, we could do the same thing for aggregate and window functions
Additional context
No response