Open
Description
Is your feature request related to a problem or challenge?
Many DataFusion users are using DataFusion to execution workloads originally developed for Apache Spark. Examples include
- DataFusion Comet (@andygrove @comphead , etc)
- LakeHQ / Sail (@shehabgamin )
- Various internal pileines / engines (e.g. that @Omega359 and I think @Blizzara use)
They often do this for superior performance
- Part of running Spark workloads is emulating Spark sematics
- Emulating Spark semantics requires (among other things) functions compatible with Spark (which differs in semantics to the functions included in DataFusion)
Several projects are in the process of implementing Spark compatible function libraries using DataFusion's extension APIs. However. we concluded in #5600 that we could join forces and maintain a spark compatible funciton library in the core datafusion repo. @shehabgamin has implemented the first step in #15168 🙏
Describe the solution you'd like
This ticket tracks "completing" the spark function library started in #15168
Describe alternatives you've considered
Related Issues
- [DISCUSSION] Add separate crate to cover spark builtin functions #5600
- feat: Add
datafusion-spark
crate #15168 - [datafusion-spark] Example of using Spark compatible function library #15915
- [datafusion-spark] Test integrating datafusion-spark code into comet datafusion-comet#1704
- [datafusion-spark] Implement
ceil
function #15916 - Spark-compatible CAST operation #11201
- Add xxhash algorithms in SQL and expression api #14367
- [EPIC] Implement expressions as ScalarUDFImpl datafusion-comet#1819
Additional context
No response