-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Summary
TLDR: The key focus of this work is to speed up fundamentally row oriented operations like hash table lookup or comparisons (e.g. #2427)
Background
DataFusion, like many Arrow systems, is a classic "vectorized computation engine" which works quite well for many common operations. The following paper, gives a good treatment on the various tradeoffs between vectorized and JIT's compilation of query plans: https://db.in.tum.de/~kersten/vectorization_vs_compilation.pdf?lang=de
As mentioned in the paper, there are some fundamentally "row oriented" operations in a database that are not typically amenable to vectorization. The "classics" are: Hash table updates in Joins and Hash Aggregates, as well as comparing tuples in sort.
Another example can be found in these slides from this presentation
@yjshen added initial support for JIT'ing in #1849 and it currently lives in https://github.com/apache/arrow-datafusion/tree/master/datafusion/jit. He also added partial support for aggregates in #2375
This ticket aims to be a central location for tracking the status of JIT compiling expressions for anyone who wants to contribute to this effort
Describe the solution you'd like