Improve substrait support to include more types and expressions#6484
Improve substrait support to include more types and expressions#6484
Conversation
…ions Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
1315d70 to
c09661d
Compare
gatesn
left a comment
There was a problem hiding this comment.
One comment, otherwise looks great thank you!
|
|
||
| @final | ||
| class Expr: | ||
| def evaluate(self, array: IntoArray) -> Array: ... |
There was a problem hiding this comment.
I think we should use the "new" inverted call of Array::apply(expr)
There was a problem hiding this comment.
I just moved it up here because it made more sense visually to me, but I'll add that change
| self_: PyRef<'py, Self>, | ||
| right: &Bound<'py, PyAny>, | ||
| ) -> PyResult<Bound<'py, PyExpr>> { | ||
| py_binary_operator(self_, Operator::Div, coerce_expr(right)?) |
There was a problem hiding this comment.
I think operator.truediv should always return float: https://docs.python.org/3/library/operator.html#operator.truediv.
I think the only sensible way to implement this is:
- truediv is Operator::Div on floating-point types; otherwise, type error.
- floordiv is Operator::Div on non-floating-point types; otherwise, type error.
There was a problem hiding this comment.
I would say for a DSL it's ok to abuse the operators a little bit? Particularly as expressions are untyped.
|
|
||
| def test_duckdb(ds: vx.dataset.VortexDataset): | ||
| assert ds # pyright cannot determine that ds is used by duckdb.execute | ||
| # This would be a nice test but we do not support IsNotNull which duckdb uses |
vortex-python/src/scalar/factory.rs
Outdated
| // decimal | ||
| if let Some(dtype) = dtype.and_then(|d| d.as_decimal_opt()) { | ||
| return Ok(Scalar::decimal( | ||
| vortex::scalar::DecimalValue::I128(value.extract::<i128>()?), |
There was a problem hiding this comment.
Does this mean I have to specify the I128 representation as a Python int to compare to a decimal column? Seems a bit weird
There was a problem hiding this comment.
need to think this through, maybe there's no way to avoid doing the full match here
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Merging this PR will degrade performance by 11.52%
Performance Changes
Comparing Footnotes
|
Does this PR closes an open issue or discussion?
What changes are included in this PR?
is_nullandis_not_nullsubstrate expression, which also allows for querying vortex-backed datasets with filters through duckdb!expr.evalute(..)API toarray.apply(...), to align the python API better with the Rust one.What is the rationale for this change?
Both the arrow dataset API and substrait open up more ways to use Vortex through existing tools without maintaining dedicated integrations.
How is this change tested?
Adds a few additional tests covering new code paths, and it also unlocks an existing test that was expected to fail.
Are there any user-facing changes?
Extends the public API surface in Python, doesn't break any existing code.