[FEA] Expand JIT functionality in libcudf

## Introduction 
There are a some areas where JIT-compiled kernels can provide performance improvements over existing libcudf functions. 

Please note that this issue is focused on CUDA C++ features in libcudf that use [JITIFY](https://github.com/NVIDIA/jitify) and [nvrtc](https://docs.nvidia.com/cuda/nvrtc/), rather than cuDF-python features using Numba to generate PTX from user-defined Python functions.

## JIT transforms, JIT projection expressions
JIT transforms, or UDF (user defined function) transforms, can be used to fuse together multiple binary ops or function calls within a single kernel. This eliminates the materialization of intermediates and for complex expressions can lead to significant speedup. We've written a custom "polynomials" benchmark in https://github.com/rapidsai/cudf/pull/17695 that shows >10x speedup for JIT-compiled kernels versus binary ops and AST (abstract syntax tree) implementations. 

* support decimal types https://github.com/rapidsai/cudf/pull/17968
* support multiple column inputs (single column output) https://github.com/rapidsai/cudf/pull/17881
* compare `imbalanced_tree` benchmarks for JIT vs binary ops vs AST https://github.com/rapidsai/cudf/pull/18032
* collect data on NDS and NDS-H runtime impact of JIT compiled expressions https://github.com/rapidsai/cudf/pull/18127
* support operators with string input and fixed-width output https://github.com/rapidsai/cudf/pull/18378
* support operators with string input and string output https://github.com/rapidsai/cudf/pull/18490
* implement null-aware transforms

## JIT Filters

* support UDF filters https://github.com/rapidsai/cudf/pull/19070
* implement null-aware filters


## JIT aggregation
JIT aggregations, or UDAFs (user defined aggregation functions), can be used to complete complex transformations on the groups of a groupby aggregation. libcudf supports both `CUDA` and `PTX` aggregation kinds.

Some examples of UDAFs could include "compute score" with additional flexibility for feature engineering. Here are some ["compute score" examples from the archived TorchArrow project](https://github.com/pytorch/torcharrow/blob/15a7f7124d4c73c8c541547aef072264baab63b7/csrc/velox/functions/rec/compute_score.h).  

To support some of these functions, the user might create a struct column that contains a list of id's, a list of targets, and a score per target. Ref: https://pytorch.org/torcharrow/beta/functional.html
```
get_score_sum | Return the sum of all the scores in matching_id_scores that has a corresponding id in matching_ids that is also in input_ids.
get_score_min | Return the min among of all the scores in matching_id_scores that has a corresponding id in matching_ids that is also in input_ids.
get_score_max | Return the min among of all the scores in matching_id_scores that has a corresponding id in matching_ids that is also in input_ids.
```


## JIT join
Currently libcudf uses `mixed_join` to fuse together hash join with post-filter. Mixed joins accept an AST predicate that is applied as thread-per-row when the probe table equality keys are found in the build table. Mixed joins have poor warp occupancy due to heavy register pressure, as a result of combined hash join and AST expression functionality into a single kernel. 

* implement conditional joins
* implement mixed joins
* implement custom expressions as keys

One alternative would be to use code gen to check the post-equality predicate and JIT-compile the resulting kernel. Please see https://github.com/rapidsai/cudf/issues/15366 for some additional context.

## Improving JIT infrastructure

As part of expanding JIT functionality in libcudf, we will need better tools for tracking JIT-compilation time (https://github.com/NVIDIA/jitify/issues/137). We will also need better tools for JIT cache management such as clearing and pre-populating. Collaboration with Spark-RAPIDS and other partners will be critical for success.

## JIT benchmarking

We could write libcudf UDF's for some of the operations in [UDFBench](https://github.com/athenarc/UDFBench/tree/main), also see the 2025 paper [here](https://www.vldb.org/pvldb/vol18/p2804-foufoulas.pdf). Some example UDFs can be found at https://github.com/athenarc/UDFBench/tree/main/engines/duckdb/udfs.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA] Expand JIT functionality in libcudf #18023

Introduction

JIT transforms, JIT projection expressions

JIT Filters

JIT aggregation

JIT join

Improving JIT infrastructure

JIT benchmarking

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Expand JIT functionality in libcudf #18023

Description

Introduction

JIT transforms, JIT projection expressions

JIT Filters

JIT aggregation

JIT join

Improving JIT infrastructure

JIT benchmarking

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions