Closed
Description
Describe the bug
Unable to run the following style query due to error in subquery:
select column1 from t1 where column1 in (select distinct column1 as other from t1 where false);
To Reproduce
$ export DATAFUSION_OPTIMIZER_SKIP_FAILED_RULES=false
$ datafusion-cli
DataFusion CLI v19.0.0
❯ CREATE TABLE IF NOT EXISTS t1 AS VALUES(1,111),(2,222);
0 rows in set. Query took 0.003 seconds.
❯ select * from t1;
+---------+---------+
| column1 | column2 |
+---------+---------+
| 1 | 111 |
| 2 | 222 |
+---------+---------+
2 rows in set. Query took 0.001 seconds.
❯ select column1 from t1;
+---------+
| column1 |
+---------+
| 1 |
| 2 |
+---------+
2 rows in set. Query took 0.001 seconds.
❯ select distinct column1 from t1;
+---------+
| column1 |
+---------+
| 2 |
| 1 |
+---------+
2 rows in set. Query took 0.004 seconds.
❯ select column1 from t1 where column1 in (select column1 as other from t1 where false);
0 rows in set. Query took 0.002 seconds.
❯ select column1 from t1 where column1 in ((select column1 as other from t1 where false));
This feature is not implemented: Physical plan does not support logical expression (<subquery>)
❯ select column1 from t1 where column1 in (select distinct column1 as other from t1 where false);
decorrelate_where_in
caused by
Internal error: Optimizer rule 'decorrelate_where_in' failed due to unexpected error: a projection is required at /Users/rustomms/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-optimizer-19.0.0/src/decorrelate_where_in.rs:147
caused by
Error during planning: Could not coerce into Projection! at /Users/rustomms/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-expr-19.0.0/src/logical_plan/plan.rs:1394. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
❯
Or run the following sql file with datafusion-cli -f test.sql
-- Test queries
CREATE TABLE IF NOT EXISTS t1 AS VALUES(1,111),(2,222);
select * from t1;
select column1 from t1;
select distinct column1 from t1;
-- The first select with subquery statement works with 0 rows the second errors and third error on datafusion 19.0.0
select column1 from t1 where column1 in (select column1 as other from t1 where false);
-- This is fixed in main/20.0.0 by https://github.com/apache/arrow-datafusion/issues/5529
select column1 from t1 where column1 in ((select column1 as other from t1 where false));
select column1 from t1 where column1 in (select distinct column1 as other from t1 where false);
Expected behavior
Can use distinct in a subquery
Additional context