Skip to content

Conversation

@lukekim
Copy link

@lukekim lukekim commented Feb 4, 2026

~/dev/vortex$  cargo bench -p vortex-array --bench expr_case_when 2>&1
   Compiling vortex-array v0.1.0 (/home/lukim/dev/vortex/vortex-array)
    Finished `bench` profile [optimized + debuginfo] target(s) in 3.56s
     Running benches/expr/case_when_bench.rs (target/release/deps/expr_case_when-aaa02f2318e0414b)
Timer precision: 32 ns
expr_case_when                    fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ case_when_all_false                          │               │               │               │         │
│  ├─ 10000                       12.15 µs      │ 1.852 ms      │ 12.45 µs      │ 31.16 µs      │ 100     │ 100
│  ├─ 100000                      49.96 µs      │ 163 µs        │ 50.61 µs      │ 53.24 µs      │ 100     │ 100
│  ╰─ 1000000                     131.6 µs      │ 1.462 ms      │ 132.9 µs      │ 197.1 µs      │ 100     │ 100
├─ case_when_all_true                           │               │               │               │         │
│  ├─ 10000                       5.839 µs      │ 10.68 µs      │ 5.967 µs      │ 6.023 µs      │ 100     │ 100
│  ├─ 100000                      18.19 µs      │ 42.65 µs      │ 18.43 µs      │ 18.74 µs      │ 100     │ 100
│  ╰─ 1000000                     151.4 µs      │ 1.17 ms       │ 152.7 µs      │ 164.1 µs      │ 100     │ 100
├─ case_when_nary_3_conditions                  │               │               │               │         │
│  ├─ 10000                       21.77 µs      │ 103.6 µs      │ 22.12 µs      │ 23.02 µs      │ 100     │ 100
│  ├─ 100000                      73.64 µs      │ 355.9 µs      │ 75.23 µs      │ 78.35 µs      │ 100     │ 100
│  ╰─ 1000000                     3.303 ms      │ 4.089 ms      │ 3.992 ms      │ 3.978 ms      │ 100     │ 100
├─ case_when_nary_10_conditions                 │               │               │               │         │
│  ├─ 10000                       70.57 µs      │ 84.81 µs      │ 71.43 µs      │ 71.85 µs      │ 100     │ 100
│  ├─ 100000                      236.4 µs      │ 256 µs        │ 241.9 µs      │ 242.3 µs      │ 100     │ 100
│  ╰─ 1000000                     5.28 ms       │ 6.321 ms      │ 6.184 ms      │ 6.143 ms      │ 100     │ 100
├─ case_when_nary_100_conditions                │               │               │               │         │
│  ├─ 10000                       689.5 µs      │ 709.7 µs      │ 695.4 µs      │ 695.5 µs      │ 100     │ 100
│  ├─ 100000                      2.344 ms      │ 2.683 ms      │ 2.355 ms      │ 2.366 ms      │ 100     │ 100
│  ╰─ 1000000                     32.67 ms      │ 34.15 ms      │ 33.45 ms      │ 33.48 ms      │ 100     │ 100
╰─ case_when_simple                             │               │               │               │         │
   ├─ 10000                       8.063 µs      │ 21.23 µs      │ 8.239 µs      │ 8.382 µs      │ 100     │ 100
   ├─ 100000                      28.56 µs      │ 41.48 µs      │ 28.94 µs      │ 29.14 µs      │ 100     │ 100
   ╰─ 1000000                     3.179 ms      │ 3.551 ms      │ 3.242 ms      │ 3.25 ms       │ 100     │ 100

This pull request updates how n-ary (multi-condition) CASE WHEN expressions are handled in both benchmarks and DataFusion expression conversion, moving away from nested binary implementations to a flat, more scalable approach. It also expands benchmark coverage for large numbers of conditions and adds comprehensive tests for these scenarios.

N-ary CASE WHEN support and benchmarks:

  • Updated all relevant benchmarks in case_when_bench.rs to use the n-ary case_when API instead of the old nested binary form, and increased the tested array sizes for more realistic performance evaluation. New benchmarks were added for 10 and 100 condition cases. [1] [2] [3] [4]
  • Removed the use and import of nested_case_when in favor of case_when, reflecting the new preferred implementation.

DataFusion integration improvements:

  • Changed the DataFusion-to-Vortex expression conversion logic to generate flat n-ary case_when expressions (and case_when_no_else when there's no ELSE clause), instead of building nested binary trees. This simplifies the logic and improves performance for many-condition cases. [1] [2]

Expanded test coverage:

  • Added new tests for n-ary CASE WHEN expressions with multiple conditions and for CASE WHEN expressions without an ELSE clause, ensuring correct handling and nullability semantics.

These changes make the implementation more efficient and robust, especially when handling complex CASE WHEN expressions with many conditions.

@lukekim lukekim self-assigned this Feb 4, 2026
@lukekim lukekim added the enhancement New feature or request label Feb 4, 2026
@lukekim lukekim merged commit 7fa584c into develop Feb 5, 2026
15 of 42 checks passed
@lukekim lukekim deleted the lukim/pr-comments-2 branch February 5, 2026 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants