Skip to content

[DISCUSS] organization of functions #9100

Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

As we break the function library out of the DataFusion core as part of #8045, we need to organize it somehow

As pointed out by @viirya on #8705 (comment) there are many potential ways to organize the data fusion function library. Thus I would like to get some consensus on how we want the organization to look before creating tickets and starting to crank it out

Describe the solution you'd like

Here is a proposal for how the functions are organized.

math functions

  • feature_flag (new): math_expressions
  • code location: datafusion/functions/src/math
  • Abs, Acos, Asin, Atan, Atan2, Acosh, Asinh, Atanh, Cbrt, Ceil, Cos, Cosh, Degrees, Exp, Factorial, Floor, Gcd, Lcm, Ln, Log, Log10, Log2, Pi, Power, Radians, Signum, Sin, Sinh, Sqrt, Tan, Tanh, Trunc, Cot, Round, iszero
  • Isnan: is the value NaN
  • Nanvl: return the first non-NaN value

array functions

Given the size and specialization of these functions I propose putting them into their own subcrate

  • feature_flag (new): array_expressions
  • code location: datafusion/functions-array/src/math
  • ArrayAppend, ArraySort, ArrayConcat, ArrayHas, ArrayHasAll, ArrayHasAny, ArrayPopFront, ArrayPopBack, ArrayDims, ArrayDistinct, ArrayElement, ArrayEmpty, ArrayLength, ArrayNdims, ArrayPosition, ArrayPositions, ArrayPrepend, ArrayRemove, ArrayRemoveN, ArrayRemoveAll, ArrayRepeat, ArrayReplace, ArrayReplaceN, ArrayReplaceAll, ArraySlice, ArrayToString, ArrayIntersect, ArrayUnion, ArrayExcept, Cardinality, ArrayResize, Flatten, Range, StringToArray,
  • MakeArray: construct an array from columns (union/except depends on this)

Core functions

These functions are always available as they are used for internal purposes (like implementing [1,2,3] syntax in SQL or so commonly used that it is not worth having a feature flag)

  • feature_flag: NONE
  • code location: datafusion/functions/src/core or similar
  • Coalesce: return the first non-null value
  • Struct: Create a struct
  • NullIf: return null if the two values are equal
  • Random: return a random number
  • ArrowTypeOf: return the arrow type of a value

Crypto functions

  • feature_flag (existing): crypto_expressions
  • code location: datafusion/functions/src/crypto
  • Digest, MD5, SHA224, SHA256, SHA384, SHA512

String functions

  • feature_flag (new): string_expressions
  • code location: datafusion/functions/src/string
  • ascii, bit_length, btrim, chr, concat, concat_ws, ends_with, initcap, instr, lower, ltrim, octet_length, repeat, replace, rtrim, split_part, starts_with, to_hex, trim, upper, levenshtein, uuid, overlay

Unicode string functions

These expressions need an additioanl dependency, which is why they have a different flag)

  • feature_flag (existing): unicode_expressions
  • code location: datafusion/functions/src/string/unicode
  • CharLength, Left, Lpad, Reverse, Right, Rpad, Strpos, Substr, Translate, SubstrIndex, FindInSet

regex functions

  • feature_flag (existing): regex_expressions
  • code location: datafusion/functions/src/regexp
  • RegexpMatch, RegexpReplace

date time function

  • feature_flag (new): datetime_expressions
  • code location: datafusion/functions/src/datetime
  • date_part, date_trunc, date_bin, to_timestamp, to_timestamp_millis, to_timestamp_micros, to_timestamp_nanos, to_timestamp_seconds, from_unixtime, now, current_date, current_time

Describe alternatives you've considered

We can have more fine grained crates, or different organizations, etc

For example, perhaps we should pull the string functions into datafusion/functions-string crate

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions