Skip to content

Improve substrait support to include more types and expressions#6484

Merged
AdamGS merged 5 commits intodevelopfrom
adamg/more-substrait
Feb 13, 2026
Merged

Improve substrait support to include more types and expressions#6484
AdamGS merged 5 commits intodevelopfrom
adamg/more-substrait

Conversation

@AdamGS
Copy link
Contributor

@AdamGS AdamGS commented Feb 13, 2026

Does this PR closes an open issue or discussion?

  • Closes #.

What changes are included in this PR?

  1. Support for more types - including new time related ones and decimal.
  2. Support for basic arithmetic operations (+,-,/,*).
  3. Support for the is_null and is_not_null substrate expression, which also allows for querying vortex-backed datasets with filters through duckdb!
  4. Move for expr.evalute(..) API to array.apply(...), to align the python API better with the Rust one.

What is the rationale for this change?

Both the arrow dataset API and substrait open up more ways to use Vortex through existing tools without maintaining dedicated integrations.

How is this change tested?

Adds a few additional tests covering new code paths, and it also unlocks an existing test that was expected to fail.

Are there any user-facing changes?

Extends the public API surface in Python, doesn't break any existing code.

@AdamGS AdamGS added lang/python Relates to the Vortex Python API changelog/feature A new feature labels Feb 13, 2026
@AdamGS AdamGS requested review from danking and gatesn February 13, 2026 14:24
…ions

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
@AdamGS AdamGS force-pushed the adamg/more-substrait branch from 1315d70 to c09661d Compare February 13, 2026 14:24
@AdamGS AdamGS marked this pull request as ready for review February 13, 2026 14:24
Copy link
Contributor

@gatesn gatesn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, otherwise looks great thank you!


@final
class Expr:
def evaluate(self, array: IntoArray) -> Array: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the "new" inverted call of Array::apply(expr)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved it up here because it made more sense visually to me, but I'll add that change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self_: PyRef<'py, Self>,
right: &Bound<'py, PyAny>,
) -> PyResult<Bound<'py, PyExpr>> {
py_binary_operator(self_, Operator::Div, coerce_expr(right)?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think operator.truediv should always return float: https://docs.python.org/3/library/operator.html#operator.truediv.

I think the only sensible way to implement this is:

  1. truediv is Operator::Div on floating-point types; otherwise, type error.
  2. floordiv is Operator::Div on non-floating-point types; otherwise, type error.

Copy link
Contributor

@gatesn gatesn Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say for a DSL it's ok to abuse the operators a little bit? Particularly as expressions are untyped.


def test_duckdb(ds: vx.dataset.VortexDataset):
assert ds # pyright cannot determine that ds is used by duckdb.execute
# This would be a nice test but we do not support IsNotNull which duckdb uses
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

// decimal
if let Some(dtype) = dtype.and_then(|d| d.as_decimal_opt()) {
return Ok(Scalar::decimal(
vortex::scalar::DecimalValue::I128(value.extract::<i128>()?),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean I have to specify the I128 representation as a Python int to compare to a decimal column? Seems a bit weird

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to think this through, maybe there's no way to avoid doing the full match here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ended up doing it

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
@AdamGS AdamGS requested review from danking and gatesn February 13, 2026 15:34
@codspeed-hq
Copy link

codspeed-hq bot commented Feb 13, 2026

Merging this PR will degrade performance by 11.52%

❌ 3 regressed benchmarks
✅ 1132 untouched benchmarks
⏩ 1268 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation take_map[(0.05, 0.5)] 1.7 ms 1.9 ms -10.75%
Simulation take_map[(0.1, 0.5)] 2.1 ms 2.3 ms -11.52%
Simulation take_map[(0.05, 1.0)] 3.1 ms 3.4 ms -10.09%

Comparing adamg/more-substrait (4b70b0e) with develop (4c997c9)

Open in CodSpeed

Footnotes

  1. 1268 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@AdamGS AdamGS merged commit 2617080 into develop Feb 13, 2026
47 of 48 checks passed
@AdamGS AdamGS deleted the adamg/more-substrait branch February 13, 2026 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature lang/python Relates to the Vortex Python API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants