Description
Is your feature request related to a problem or challenge?
As we break the function library out of the DataFusion core as part of #8045, we need to organize it somehow
As pointed out by @viirya on #8705 (comment) there are many potential ways to organize the data fusion function library. Thus I would like to get some consensus on how we want the organization to look before creating tickets and starting to crank it out
Describe the solution you'd like
Here is a proposal for how the functions are organized.
math functions
- feature_flag (new):
math_expressions
- code location:
datafusion/functions/src/math
- Abs, Acos, Asin, Atan, Atan2, Acosh, Asinh, Atanh, Cbrt, Ceil, Cos, Cosh, Degrees, Exp, Factorial, Floor, Gcd, Lcm, Ln, Log, Log10, Log2, Pi, Power, Radians, Signum, Sin, Sinh, Sqrt, Tan, Tanh, Trunc, Cot, Round, iszero
Isnan
: is the value NaNNanvl
: return the first non-NaN value
array functions
Given the size and specialization of these functions I propose putting them into their own subcrate
- feature_flag (new):
array_expressions
- code location:
datafusion/functions-array/src/math
- ArrayAppend, ArraySort, ArrayConcat, ArrayHas, ArrayHasAll, ArrayHasAny, ArrayPopFront, ArrayPopBack, ArrayDims, ArrayDistinct, ArrayElement, ArrayEmpty, ArrayLength, ArrayNdims, ArrayPosition, ArrayPositions, ArrayPrepend, ArrayRemove, ArrayRemoveN, ArrayRemoveAll, ArrayRepeat, ArrayReplace, ArrayReplaceN, ArrayReplaceAll, ArraySlice, ArrayToString, ArrayIntersect, ArrayUnion, ArrayExcept, Cardinality, ArrayResize, Flatten, Range, StringToArray,
MakeArray
: construct an array from columns (union/except depends on this)
Core functions
These functions are always available as they are used for internal purposes (like implementing [1,2,3]
syntax in SQL or so commonly used that it is not worth having a feature flag)
- feature_flag: NONE
- code location:
datafusion/functions/src/core
or similar Coalesce
: return the first non-null valueStruct
: Create a structNullIf
: return null if the two values are equalRandom
: return a random numberArrowTypeOf
: return the arrow type of a value
Crypto functions
- feature_flag (existing):
crypto_expressions
- code location:
datafusion/functions/src/crypto
- Digest, MD5, SHA224, SHA256, SHA384, SHA512
String functions
- feature_flag (new):
string_expressions
- code location:
datafusion/functions/src/string
- ascii, bit_length, btrim, chr, concat, concat_ws, ends_with, initcap, instr, lower, ltrim, octet_length, repeat, replace, rtrim, split_part, starts_with, to_hex, trim, upper, levenshtein, uuid, overlay
Unicode string functions
These expressions need an additioanl dependency, which is why they have a different flag)
- feature_flag (existing):
unicode_expressions
- code location:
datafusion/functions/src/string/unicode
- CharLength, Left, Lpad, Reverse, Right, Rpad, Strpos, Substr, Translate, SubstrIndex, FindInSet
regex functions
- feature_flag (existing):
regex_expressions
- code location:
datafusion/functions/src/regexp
- RegexpMatch, RegexpReplace
date time function
- feature_flag (new):
datetime_expressions
- code location:
datafusion/functions/src/datetime
- date_part, date_trunc, date_bin, to_timestamp, to_timestamp_millis, to_timestamp_micros, to_timestamp_nanos, to_timestamp_seconds, from_unixtime, now, current_date, current_time
Describe alternatives you've considered
We can have more fine grained crates, or different organizations, etc
For example, perhaps we should pull the string functions into datafusion/functions-string
crate
Additional context
No response