fix: allow placeholders to be substituted when coercible #8977

erratic-pattern · 2024-01-24T00:02:04Z

Which issue does this PR close?

Closes #8976.

Rationale for this change

It should be possible for the types of parameter values and placeholders to be different, as long as the ScalarValue is coercible to the Expr::Placeholders data type.

What changes are included in this PR?

Removal of the eager == check on data types, which allows type coercion to happen later in planning.

Are these changes tested?

Are there any user-facing changes?

The data_type argument to ParamValues::get_placeholders_with_values is currently unused. If we decide to remove it that would be a user-facing change to the API.

Alternatively, instead of deleting the eager check and leaving the data_type parameter unused, we could improve the eager check with code borrowed from the type coercion module but I'm not sure of the best way to bridge the crate gap there (datafusion-common doesn't depend on datafusion-expr)

erratic-pattern · 2024-01-24T00:14:33Z

I think if ParamValues were moved to datafusion-expr that would solve the dependency loop here. Then it could call can_coerce_from directly.

erratic-pattern · 2024-01-24T00:57:48Z

Actually I think this code from the type coercion optimization rule is the logic we would want to use here

alamb

Thank you @kallisti-dev -- this is an important problem to solve

The code in this PR looks ok to me, but needs a test to make sure the problem is solved and to ensure we don't introduce regressions in the future

It would be amazing if you could take a shot at writing some tests, but if you don't have time I can do so as well.

Let me know if this does't make sense

Test Outline

I think the test should probably look something like what we want to do in IOx (which I think is pretty standard). Maybe we can add a test to the core_integration tests in datafusion/core/tests/sql/parameters.rs that does something like:

Step 1: User provides a SQL with placeholder

for example, if id is an i64

SELECT * FROM t1 WHRERE id = $1

DataFusion makes a `LogicalPlan` (possbly wrapped in a DataFrame)

let plan = ctx.sql(sql_string);

User provides the parameters

(note in this case the user may provide an i32 for id which should be automatically coerced to i64)

let plan = plan
  .with_param_values(vec![
    ("id", ScalarValue::from(3i32)),
  ])

Tests to write;

Simply positive test (described above)
Negative tests (when the parameter value can not be coerced -- e.g. pass a string that is not a valid number)
Use both PREPARE and normal SELECT

erratic-pattern · 2024-01-24T15:43:54Z

@alamb Added new tests to cover those cases. happy path for named params and prepared statements already existed so I put my new tests next to those.

However I noticed one test that should pass is not. It looks the string "not a number" is being successfully coerced to Int32

erratic-pattern · 2024-01-24T15:44:47Z

datafusion/core/tests/sql/select.rs

+        .with_param_values(vec![("text", ScalarValue::from("not a number"))])?
+        .collect()
+        .await;
+    assert_eq!(results.unwrap_err().strip_backtrace(), "");


this test is failing with

---- sql::select::test_parameter_invalid_types stdout ---- thread 'sql::select::test_parameter_invalid_types' panicked at datafusion/core/tests/sql/select.rs:701:24: called `Result::unwrap_err()` on an `Ok` value: []

I would expect it to not be able to coerce the string

🤔 I am also surprised this works, but apparently strings can be coerced to numbers (null in this case):

❯ create table test (signed int) as values (-1), (0), (1); 0 rows in set. Query took 0.005 seconds. ❯ select * from test; +--------+ | signed | +--------+ | -1 | | 0 | | 1 | +--------+ 3 rows in set. Query took 0.001 seconds. ❯ select * from test where signed = 'not a number'; 0 rows in set. Query took 0.003 seconds.

Maybe you can try something like this:

❯ create table test3 as values ([1,2,3]); 0 rows in set. Query took 0.002 seconds. ❯ select * from test3; +-----------+ | column1 | +-----------+ | [1, 2, 3] | +-----------+ 1 row in set. Query took 0.001 seconds. ❯ select * from test3 where column1 = 4; Arrow error: Invalid argument error: Invalid comparison operation: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) == List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })

Here is an example of creating a list array: https://docs.rs/arrow/latest/arrow/array/type.ListArray.html#method.from_iter_primitive

I've updated the test as described

alamb

Looking very nice @kallisti-dev 👍

I have a test suggestion on how to trigger an error but I think this PR could also be merged as is (if CI passes) and we could improve the tests as a follow on

alamb · 2024-01-24T16:46:29Z

datafusion/core/tests/sql/select.rs

+        .with_param_values(vec![("text", ScalarValue::from("not a number"))])?
+        .collect()
+        .await;
+    assert_eq!(results.unwrap_err().strip_backtrace(), "");


🤔 I am also surprised this works, but apparently strings can be coerced to numbers (null in this case):

❯ create table test (signed int) as values (-1), (0), (1); 0 rows in set. Query took 0.005 seconds. ❯ select * from test; +--------+ | signed | +--------+ | -1 | | 0 | | 1 | +--------+ 3 rows in set. Query took 0.001 seconds. ❯ select * from test where signed = 'not a number'; 0 rows in set. Query took 0.003 seconds.

Maybe you can try something like this:

❯ create table test3 as values ([1,2,3]); 0 rows in set. Query took 0.002 seconds. ❯ select * from test3; +-----------+ | column1 | +-----------+ | [1, 2, 3] | +-----------+ 1 row in set. Query took 0.001 seconds. ❯ select * from test3 where column1 = 4; Arrow error: Invalid argument error: Invalid comparison operation: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) == List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })

Here is an example of creating a list array: https://docs.rs/arrow/latest/arrow/array/type.ListArray.html#method.from_iter_primitive

ci is failing

erratic-pattern · 2024-01-24T17:58:55Z

@alamb I've updated the failing test with the suggestion you made to try coercing integer to a list. it should pass now.

alamb · 2024-01-24T18:18:57Z

@alamb I've updated the failing test with the suggestion you made to try coercing integer to a list. it should pass now.

Looks like there is still a clippy failure @kallisti-dev : https://github.com/apache/arrow-datafusion/actions/runs/7644208478/job/20828392768?pr=8977

…-substition-when-coercible

alamb

Thank you @kallisti-dev

I took the liberty of pushing a fix for clippy to your branch and I plan to merge this when CI passes

alamb · 2024-01-24T19:19:35Z

Thanks again @kallisti-dev

alamb mentioned this pull request Jan 24, 2024

DataFusion weekly project plan (Andrew Lamb) - Jan 22, 2024 #8933

Closed

9 tasks

alamb reviewed Jan 24, 2024

View reviewed changes

erratic-pattern force-pushed the fix/allow-placeholder-substition-when-coercible branch from 81f8078 to 56cdb81 Compare January 24, 2024 15:42

github-actions bot added the core Core DataFusion crate label Jan 24, 2024

erratic-pattern commented Jan 24, 2024

View reviewed changes

erratic-pattern force-pushed the fix/allow-placeholder-substition-when-coercible branch from 56cdb81 to 4b51890 Compare January 24, 2024 16:13

github-actions bot added the logical-expr Logical plan and expressions label Jan 24, 2024

alamb previously approved these changes Jan 24, 2024

View reviewed changes

fix: allow placeholders to be substited when coercible

287594d

erratic-pattern force-pushed the fix/allow-placeholder-substition-when-coercible branch from 4b51890 to 287594d Compare January 24, 2024 17:52

erratic-pattern requested a review from alamb January 24, 2024 17:54

alamb added 2 commits January 24, 2024 13:34

Merge remote-tracking branch 'apache/main' into fix/allow-placeholder…

4e1be89

…-substition-when-coercible

fix clippy

c3ba4db

alamb approved these changes Jan 24, 2024

View reviewed changes

alamb merged commit 94a6192 into apache:main Jan 24, 2024
22 checks passed

alamb mentioned this pull request Jan 29, 2024

Optimizing LogicalPlan with placeholders fails #8819

Open

erratic-pattern changed the title ~~fix: allow placeholders to be substited when coercible~~ fix: allow placeholders to be substituted when coercible Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: allow placeholders to be substituted when coercible #8977

fix: allow placeholders to be substituted when coercible #8977

erratic-pattern commented Jan 24, 2024 •

edited

Loading

erratic-pattern commented Jan 24, 2024

erratic-pattern commented Jan 24, 2024

alamb left a comment

erratic-pattern commented Jan 24, 2024

erratic-pattern Jan 24, 2024

alamb Jan 24, 2024

erratic-pattern Jan 24, 2024

alamb left a comment

alamb Jan 24, 2024

erratic-pattern commented Jan 24, 2024

alamb commented Jan 24, 2024

alamb left a comment

alamb commented Jan 24, 2024

fix: allow placeholders to be substituted when coercible #8977

fix: allow placeholders to be substituted when coercible #8977

Conversation

erratic-pattern commented Jan 24, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

erratic-pattern commented Jan 24, 2024

erratic-pattern commented Jan 24, 2024

alamb left a comment

Choose a reason for hiding this comment

Test Outline

Step 1: User provides a SQL with placeholder

DataFusion makes a LogicalPlan (possbly wrapped in a DataFrame)

User provides the parameters

Tests to write;

erratic-pattern commented Jan 24, 2024

erratic-pattern Jan 24, 2024

Choose a reason for hiding this comment

alamb Jan 24, 2024

Choose a reason for hiding this comment

erratic-pattern Jan 24, 2024

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb Jan 24, 2024

Choose a reason for hiding this comment

erratic-pattern commented Jan 24, 2024

alamb commented Jan 24, 2024

alamb left a comment

Choose a reason for hiding this comment

alamb commented Jan 24, 2024

erratic-pattern commented Jan 24, 2024 •

edited

Loading

DataFusion makes a `LogicalPlan` (possbly wrapped in a DataFrame)