Skip to content

Conversation

@gatesn
Copy link
Contributor

@gatesn gatesn commented Nov 27, 2025

Introduces the traits required to define abstract scalar functions and wrap them up as either expressions or arrays.

This is one of the types of functions that we will split the generic ComputeFn trait into and allows us to lazily defer scalar compute in the array tree.

We don't add them in this PR, but there may be some properties that are useful for ScalarFns in some form:


    /// The identity element `e` where `f(e, x) = f(x, e) = x`.
    ///
    /// When an argument is the identity element, the function can be
    /// eliminated entirely, returning the other argument unchanged.
    ///
    /// # Examples
    /// - `AND`: `true` (AND(true, x) → x)
    /// - `OR`: `false` (OR(false, x) → x)
    /// - `+`: `0` (0 + x → x)
    /// - `*`: `1` (1 * x → x)
    /// - `COALESCE`: `NULL` (COALESCE(NULL, x) → x)
    fn identity_element(&self, options: &Self::Options) -> Option<Scalar> {
        _ = options;
        None
    }

    /// The absorbing element `a` where `f(a, x) = f(x, a) = a`.
    ///
    /// When any argument is the absorbing element, the function short-circuits
    /// immediately, returning that element without evaluating other arguments.
    /// Also known as the "annihilator" or "zero element".
    ///
    /// # Examples
    /// - `AND`: `false` (AND(false, x) → false)
    /// - `OR`: `true` (OR(true, x) → true)
    /// - `*`: `0` (0 * x → 0)
    fn absorbing_element(&self, options: &Self::Options) -> Option<Scalar> {
        _ = options;
        None
    }

    /// Whether argument order is irrelevant: `f(a, b) = f(b, a)`.
    ///
    /// Enables expression normalization (e.g., sorting arguments by column id)
    /// for better common subexpression elimination and pattern matching.
    ///
    /// # Examples
    /// - Commutative: `+`, `*`, `AND`, `OR`, `=`, `!=`, `MIN`, `MAX`
    /// - Non-commutative: `-`, `/`, `<`, `>`, `CONCAT`
    fn is_commutative(&self, options: &Self::Options) -> bool {
        _ = options;
        false
    }

    /// Whether `f(x, x) = x`.
    ///
    /// Enables simplification when the same expression appears multiple times
    /// as arguments to the function.
    ///
    /// # Examples
    /// - Idempotent: `AND`, `OR`, `MIN`, `MAX`
    /// - Non-idempotent: `+` (x + x = 2x), `*` (x * x = x²)
    fn is_idempotent(&self, options: &Self::Options) -> bool {
        _ = options;
        false
    }

    /// Whether `f(f(x)) = x` for unary functions.
    ///
    /// Enables cancellation of nested self-applications.
    ///
    /// # Examples
    /// - Involutions: `NOT`, `NEG` (for signed types), `REVERSE`
    /// - Non-involutions: `ABS`, `UPPER`, `LOWER`
    fn is_involution(&self, options: &Self::Options) -> bool {
        _ = options;
        false
    }

    /// How the function behaves when one or more arguments are NULL.
    ///
    /// Most functions propagate NULL (any NULL argument produces NULL output).
    /// Some functions have special NULL handling that can short-circuit
    /// evaluation or treat NULL as a meaningful value.
    ///
    /// Required for correct NULL semantics; may also enable optimizations
    /// when argument nullability is known from schema or statistics.
    fn null_handling(&self, options: &Self::Options) -> NullHandling {
        _ = options;
        NullHandling::default()
    }
    
    
/// How a function handles NULL arguments.
#[derive(Clone, Debug, Default, PartialEq, Eq)]
pub enum NullHandling {
    /// NULL in any argument produces NULL output.
    ///
    /// This is standard SQL behavior for most scalar functions.
    /// Enables simplification when any argument is known to be NULL.
    Propagate,

    /// NULL is short-circuited when paired with the absorbing element.
    ///
    /// This is a special case where the absorbing element "wins" over NULL.
    ///
    /// # Examples
    /// - `AND_KLEENE(false, NULL)` → `false` (false absorbs NULL)
    /// - `OR_KLEENE(true, NULL)` → `true` (true absorbs NULL)
    AbsorbsNull,

    /// The function has special NULL semantics that don't follow
    /// simple propagation rules.
    ///
    /// This prevents any simplifications based on NULL arguments.
    ///
    /// # Examples
    /// - `IS NULL`, `IS NOT NULL`: NULL → true/false
    /// - `COALESCE`: returns first non-NULL argument
    /// - `NULLIF`: conditionally produces NULL
    #[default]
    Custom,
}

Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@codspeed-hq
Copy link

codspeed-hq bot commented Nov 27, 2025

CodSpeed Performance Report

Merging #5561 will degrade performances by 10.34%

Comparing ngates/scalar-functions (cba44e3) with develop (3877839)

Summary

❌ 1 regression
✅ 1503 untouched
⏩ 271 skipped1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
bitpack_pipeline_unpack[(1000, 1.0)] 10.6 µs 11.8 µs -10.34%

Footnotes

  1. 271 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn changed the title Ngates/scalar functions Scalar functions Nov 28, 2025
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn marked this pull request as ready for review November 28, 2025 15:54
@gatesn gatesn added the changelog/feature A new feature label Nov 28, 2025
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@codecov
Copy link

codecov bot commented Nov 28, 2025

Codecov Report

❌ Patch coverage is 0.66225% with 600 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.98%. Comparing base (3f49b68) to head (cba44e3).
⚠️ Report is 5 commits behind head on develop.

Files with missing lines Patch % Lines
vortex-array/src/expr/functions/vtable.rs 0.00% 96 Missing ⚠️
vortex-array/src/expr/functions/scalar.rs 0.00% 85 Missing ⚠️
vortex-array/src/expr/exprs/scalar_fn.rs 0.00% 69 Missing ⚠️
vortex-array/src/arrays/scalar_fn/vtable/mod.rs 0.00% 65 Missing ⚠️
...ex-array/src/arrays/scalar_fn/vtable/operations.rs 0.00% 34 Missing ⚠️
vortex-vector/src/lib.rs 0.00% 34 Missing ⚠️
vortex-array/src/arrays/scalar_fn/vtable/array.rs 0.00% 33 Missing ⚠️
vortex-array/src/expr/functions/execution.rs 0.00% 31 Missing ⚠️
vortex-array/src/vectors.rs 0.00% 29 Missing ⚠️
vortex-vector/src/datum.rs 0.00% 26 Missing ⚠️
... and 12 more

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines 131 to 164
pub fn arity(&self) -> Arity {
self.vtable.as_dyn().arity(self.options)
}

pub fn identity_element(&self) -> Option<Scalar> {
self.vtable.as_dyn().identity_element(self.options)
}

pub fn absorbing_element(&self) -> Option<Scalar> {
self.vtable.as_dyn().absorbing_element(self.options)
}

pub fn is_commutative(&self) -> bool {
self.vtable.as_dyn().is_commutative(self.options)
}

pub fn is_idempotent(&self) -> bool {
self.vtable.as_dyn().is_idempotent(self.options)
}

pub fn is_involution(&self) -> bool {
self.vtable.as_dyn().is_involution(self.options)
}

pub fn monotonicity(&self, arg_idx: usize) -> Monotonicity {
self.vtable.as_dyn().monotonicity(self.options, arg_idx)
}

pub fn null_handling(&self) -> NullHandling {
self.vtable.as_dyn().null_handling(self.options)
}

pub fn arg_name(&self, arg_idx: usize) -> Option<String> {
self.vtable.as_dyn().arg_name(self.options, arg_idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need all of these can we add them once we need them?

Comment on lines +207 to +232
/// NULL in any argument produces NULL output.
///
/// This is standard SQL behavior for most scalar functions.
/// Enables simplification when any argument is known to be NULL.
Propagate,

/// NULL is short-circuited when paired with the absorbing element.
///
/// This is a special case where the absorbing element "wins" over NULL.
///
/// # Examples
/// - `AND_KLEENE(false, NULL)` → `false` (false absorbs NULL)
/// - `OR_KLEENE(true, NULL)` → `true` (true absorbs NULL)
AbsorbsNull,

/// The function has special NULL semantics that don't follow
/// simple propagation rules.
///
/// This prevents any simplifications based on NULL arguments.
///
/// # Examples
/// - `IS NULL`, `IS NOT NULL`: NULL → true/false
/// - `COALESCE`: returns first non-NULL argument
/// - `NULLIF`: conditionally produces NULL
#[default]
Custom,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AND_KLEENE might absorb and might not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It absorbs null when paired with the absorbing element.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this in a different PR if we want this

@joseph-isaacs
Copy link
Contributor

Can we remove stuff from the PR that is not required to impl ScalarFunc

Comment on lines +37 to +40
NullHandling::AbsorbsNull | NullHandling::Custom => {
// We cannot guarantee that the array is all valid without evaluating the function
false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is approximate which I am not sure was the original behaviour

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, but its only used for approximate short-circuits and we have to move the Array trait towards this definition anyway since we want to defer compute.

/// - `COALESCE`: returns first non-NULL argument
/// - `NULLIF`: conditionally produces NULL
#[default]
Custom,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call this unknown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I prefer custom? It's not that it's unknown. It just isn't describable in a useful way

/// for better common subexpression elimination and pattern matching.
///
/// # Examples
/// - Commutative: `+`, `*`, `AND`, `OR`, `=`, `!=`, `MIN`, `MAX`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely a typed property?

Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn enabled auto-merge (squash) November 28, 2025 17:57
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn merged commit 682289f into develop Nov 28, 2025
46 of 47 checks passed
@gatesn gatesn deleted the ngates/scalar-functions branch November 28, 2025 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants