Add additional tests for InListExpr#19050
Conversation
|
I expect tests to fail. We can fix them once we confirm. |
| downcast_dictionary_array! { | ||
| v => { | ||
| let values_contains = self.contains(v.values().as_ref(), negated)?; | ||
| let result = take(&values_contains, v.keys(), None)?; | ||
| return Ok(downcast_array(result.as_ref())) | ||
| } | ||
| _ => {} | ||
| } |
There was a problem hiding this comment.
The lack of this was a bug
| let result = match (v.null_count() > 0, negated) { | ||
| (true, false) => { | ||
| // has nulls, not negated" | ||
| BooleanArray::from_iter( | ||
| v.iter().map(|value| Some(self.values.contains(&value?))), |
There was a problem hiding this comment.
We were not handling nulls properly
There was a problem hiding this comment.
i double checked with postgres and the new code looks right
postgres=# select NULL IN (1);
?column?
----------
(1 row)
postgres=# select 1 IN (1, NULL);
?column?
----------
t
(1 row)
postgres=# select 1 IN (2, NULL);
?column?
----------
(1 row)
postgres=# select 1 NOT IN (2, NULL);
?column?
----------
(1 row)There was a problem hiding this comment.
yep I checked postgres and duckdb to get the right behavior
76fd945 to
f2eb513
Compare
|
@alamb I've added test harnesses to remove duplication as you requested 😄 |
772bca6 to
b3fca6a
Compare
alamb
left a comment
There was a problem hiding this comment.
Thank you @adriangb -- I went through the code fixes and tests and this PR looks like a definitely improvement to me
I left some small suggestions
I also ran code coverage on this PR to double check and it did find one more case that is not covered
cargo llvm-cov test --html -p datafusion-physical-expr
| /// This validates data types, evaluates the list as constants, and uses specialized | ||
| /// StaticFilter implementations for better performance (e.g., Int32StaticFilter for Int32). | ||
| /// | ||
| /// Returns an error if data types don't match or if the list contains non-constant expressions. |
There was a problem hiding this comment.
I don't think this comment is correct -- the code appears to handle the case of non-constant expressions as well
I think this might also be clearer if it were simply named try_new() rather than try_from_static_filter
| BooleanArray::from_iter( | ||
| v.iter().map(|value| Some(self.values.contains(&value?))), | ||
| ) | ||
| // Either needle or haystack has nulls, not negated |
There was a problem hiding this comment.
| // Either needle or haystack has nulls, not negated | |
| // needle has nulls, not negated |
| BooleanArray::from_iter( | ||
| v.iter().map(|value| Some(!self.values.contains(&value?))), | ||
| ) | ||
| // Either needle or haystack has nulls, negated |
There was a problem hiding this comment.
| // Either needle or haystack has nulls, negated | |
| // needle has nulls, negated |
| negated, | ||
| Some(instantiate_static_filter(in_array)?), | ||
| )), | ||
| Err(_) => { |
There was a problem hiding this comment.
nit -- I think it would be clearer if this try_evaluate_constant_list evaluated to Result<Option<..>> rather than Result -- that way we avoid the string allocation on error, and could pass real errors .
This code responds to all errors, even if the issue is something different than the types are not supported
If you went this route, I suspect you wouldn't need the second error check for type matches (it would already be handled)
There was a problem hiding this comment.
renamed and refactored to Result<Option<>> as you suggested
| name: &'static str, | ||
| value_in: ScalarValue, | ||
| value_not_in: ScalarValue, | ||
| value_in_list: ScalarValue, |
There was a problem hiding this comment.
my reading of these tests is that they always test a one element list -- as in the list does not have multiple values
Do you think we need coverage for multi-value lists?
There was a problem hiding this comment.
The previous tests all had mutli-value lists, such as
// expression: "a not in ("a", "b")"Maybe we could make this something like values_in: Vec<ScalarValue>
However, I see there are multi-value tests below too, so maybe this is ok
There was a problem hiding this comment.
I updated to accept other_list_values: Vec<ScalarValue> and have the tests throw in some extra values to give coverage
| let result = match (v.null_count() > 0, negated) { | ||
| (true, false) => { | ||
| // has nulls, not negated" | ||
| BooleanArray::from_iter( | ||
| v.iter().map(|value| Some(self.values.contains(&value?))), |
There was a problem hiding this comment.
i double checked with postgres and the new code looks right
postgres=# select NULL IN (1);
?column?
----------
(1 row)
postgres=# select 1 IN (1, NULL);
?column?
----------
t
(1 row)
postgres=# select 1 IN (2, NULL);
?column?
----------
(1 row)
postgres=# select 1 NOT IN (2, NULL);
?column?
----------
(1 row)| // Create dictionary-encoded batch with values [1, 2, 5] | ||
| // Dictionary: keys [0, 1, 2] -> values [1, 2, 5] | ||
| let keys = Int8Array::from(vec![0, 1, 2]); | ||
| let values = Int32Array::from(vec![1, 2, 5]); |
There was a problem hiding this comment.
I personally recommend using values that are clearly not the keys - for example
| let values = Int32Array::from(vec![1, 2, 5]); | |
| let values = Int32Array::from(vec![100, 200, 500]); |
( you would also have to change the literals above)
| } | ||
| (false, true) => { | ||
| // no null, negated | ||
| // No nulls anywhere, negated |
There was a problem hiding this comment.
according to code coverage, this branch is not covered by tests (see PR comments)
| RecordBatch::try_new(Arc::new(schema.clone()), vec![Arc::clone(&array)])?; | ||
|
|
||
| // Helper to format SQL-like representation for error messages | ||
| let _format_sql = |negated: bool, with_null: bool| -> String { |
There was a problem hiding this comment.
this helper seems unused (why is it named starting with _ 🤔 )
There was a problem hiding this comment.
it was a pipe dream to have nice error messages. I updated it and implemented it. now if a test fails you get something like:
assertion `left == right` failed: Failed for: a IN (5, 3, 7)
a: PrimitiveArray<Int32>
[
1,
3,
4,
]
left: BooleanArray
[
true,
true,
true,
]
right: BooleanArray
[
false,
true,
false,
]
b3fca6a to
89234b1
Compare
This adds tests that prove some bugs that are fixed in #18832 so we can level the playing field on benchmarks.