-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: allow placeholders to be substituted when coercible #8977
fix: allow placeholders to be substituted when coercible #8977
Conversation
I think if |
Actually I think this code from the type coercion optimization rule is the logic we would want to use here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @kallisti-dev -- this is an important problem to solve
The code in this PR looks ok to me, but needs a test to make sure the problem is solved and to ensure we don't introduce regressions in the future
It would be amazing if you could take a shot at writing some tests, but if you don't have time I can do so as well.
Let me know if this does't make sense
Test Outline
I think the test should probably look something like what we want to do in IOx (which I think is pretty standard). Maybe we can add a test to the core_integration
tests in datafusion/core/tests/sql/parameters.rs
that does something like:
Step 1: User provides a SQL with placeholder
for example, if id
is an i64
SELECT * FROM t1 WHRERE id = $1
DataFusion makes a LogicalPlan
(possbly wrapped in a DataFrame)
let plan = ctx.sql(sql_string);
User provides the parameters
(note in this case the user may provide an i32
for id
which should be automatically coerced to i64)
let plan = plan
.with_param_values(vec![
("id", ScalarValue::from(3i32)),
])
Tests to write;
- Simply positive test (described above)
- Negative tests (when the parameter value can not be coerced -- e.g. pass a string that is not a valid number)
- Use both
PREPARE
and normalSELECT
81f8078
to
56cdb81
Compare
@alamb Added new tests to cover those cases. happy path for named params and prepared statements already existed so I put my new tests next to those. However I noticed one test that should pass is not. It looks the string "not a number" is being successfully coerced to Int32 |
datafusion/core/tests/sql/select.rs
Outdated
.with_param_values(vec![("text", ScalarValue::from("not a number"))])? | ||
.collect() | ||
.await; | ||
assert_eq!(results.unwrap_err().strip_backtrace(), ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test is failing with
---- sql::select::test_parameter_invalid_types stdout ----
thread 'sql::select::test_parameter_invalid_types' panicked at datafusion/core/tests/sql/select.rs:701:24:
called `Result::unwrap_err()` on an `Ok` value: []
I would expect it to not be able to coerce the string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I am also surprised this works, but apparently strings can be coerced to numbers (null
in this case):
❯ create table test (signed int) as values (-1), (0), (1);
0 rows in set. Query took 0.005 seconds.
❯ select * from test;
+--------+
| signed |
+--------+
| -1 |
| 0 |
| 1 |
+--------+
3 rows in set. Query took 0.001 seconds.
❯ select * from test where signed = 'not a number';
0 rows in set. Query took 0.003 seconds.
Maybe you can try something like this:
❯ create table test3 as values ([1,2,3]);
0 rows in set. Query took 0.002 seconds.
❯ select * from test3;
+-----------+
| column1 |
+-----------+
| [1, 2, 3] |
+-----------+
1 row in set. Query took 0.001 seconds.
❯ select * from test3 where column1 = 4;
Arrow error: Invalid argument error: Invalid comparison operation: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) == List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })
Here is an example of creating a list array: https://docs.rs/arrow/latest/arrow/array/type.ListArray.html#method.from_iter_primitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the test as described
56cdb81
to
4b51890
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking very nice @kallisti-dev 👍
I have a test suggestion on how to trigger an error but I think this PR could also be merged as is (if CI passes) and we could improve the tests as a follow on
datafusion/core/tests/sql/select.rs
Outdated
.with_param_values(vec![("text", ScalarValue::from("not a number"))])? | ||
.collect() | ||
.await; | ||
assert_eq!(results.unwrap_err().strip_backtrace(), ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I am also surprised this works, but apparently strings can be coerced to numbers (null
in this case):
❯ create table test (signed int) as values (-1), (0), (1);
0 rows in set. Query took 0.005 seconds.
❯ select * from test;
+--------+
| signed |
+--------+
| -1 |
| 0 |
| 1 |
+--------+
3 rows in set. Query took 0.001 seconds.
❯ select * from test where signed = 'not a number';
0 rows in set. Query took 0.003 seconds.
Maybe you can try something like this:
❯ create table test3 as values ([1,2,3]);
0 rows in set. Query took 0.002 seconds.
❯ select * from test3;
+-----------+
| column1 |
+-----------+
| [1, 2, 3] |
+-----------+
1 row in set. Query took 0.001 seconds.
❯ select * from test3 where column1 = 4;
Arrow error: Invalid argument error: Invalid comparison operation: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) == List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })
Here is an example of creating a list array: https://docs.rs/arrow/latest/arrow/array/type.ListArray.html#method.from_iter_primitive
4b51890
to
287594d
Compare
@alamb I've updated the failing test with the suggestion you made to try coercing integer to a list. it should pass now. |
Looks like there is still a clippy failure @kallisti-dev : https://github.com/apache/arrow-datafusion/actions/runs/7644208478/job/20828392768?pr=8977 |
…-substition-when-coercible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @kallisti-dev
I took the liberty of pushing a fix for clippy to your branch and I plan to merge this when CI passes
Thanks again @kallisti-dev |
Which issue does this PR close?
Closes #8976.
Rationale for this change
It should be possible for the types of parameter values and placeholders to be different, as long as the
ScalarValue
is coercible to theExpr::Placeholder
s data type.What changes are included in this PR?
Removal of the eager
==
check on data types, which allows type coercion to happen later in planning.Are these changes tested?
Are there any user-facing changes?
The
data_type
argument toParamValues::get_placeholders_with_values
is currently unused. If we decide to remove it that would be a user-facing change to the API.Alternatively, instead of deleting the eager check and leaving the
data_type
parameter unused, we could improve the eager check with code borrowed from the type coercion module but I'm not sure of the best way to bridge the crate gap there (datafusion-common
doesn't depend ondatafusion-expr
)