Skip to content

Error when using DISTINCT in a filter clause using a subquery #5633

Closed
@RustomMS

Description

@RustomMS

Describe the bug

Unable to run the following style query due to error in subquery:

select column1 from t1 where column1 in (select distinct column1 as other from t1 where false);

To Reproduce

$ export DATAFUSION_OPTIMIZER_SKIP_FAILED_RULES=false
$ datafusion-cli
DataFusion CLI v19.0.0
❯ CREATE TABLE IF NOT EXISTS t1 AS VALUES(1,111),(2,222);
0 rows in set. Query took 0.003 seconds.
❯ select * from t1;
+---------+---------+
| column1 | column2 |
+---------+---------+
| 1       | 111     |
| 2       | 222     |
+---------+---------+
2 rows in set. Query took 0.001 seconds.
❯ select column1 from t1;
+---------+
| column1 |
+---------+
| 1       |
| 2       |
+---------+
2 rows in set. Query took 0.001 seconds.
❯ select distinct column1 from t1;
+---------+
| column1 |
+---------+
| 2       |
| 1       |
+---------+
2 rows in set. Query took 0.004 seconds.
❯ select column1 from t1 where column1 in (select column1 as other from t1 where false);
0 rows in set. Query took 0.002 seconds.
❯ select column1 from t1 where column1 in ((select column1 as other from t1 where false));
This feature is not implemented: Physical plan does not support logical expression (<subquery>)
❯ select column1 from t1 where column1 in (select distinct column1 as other from t1 where false);
decorrelate_where_in
caused by
Internal error: Optimizer rule 'decorrelate_where_in' failed due to unexpected error: a projection is required at /Users/rustomms/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-optimizer-19.0.0/src/decorrelate_where_in.rs:147
caused by
Error during planning: Could not coerce into Projection! at /Users/rustomms/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-expr-19.0.0/src/logical_plan/plan.rs:1394. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
❯

Or run the following sql file with datafusion-cli -f test.sql

-- Test queries
CREATE TABLE IF NOT EXISTS t1 AS VALUES(1,111),(2,222);
select * from t1;
select column1 from t1;
select distinct column1 from t1;
-- The first select with subquery statement works with 0 rows the second errors and third error on datafusion 19.0.0
select column1 from t1 where column1 in (select column1 as other from t1 where false);
-- This is fixed in main/20.0.0 by https://github.com/apache/arrow-datafusion/issues/5529
select column1 from t1 where column1 in ((select column1 as other from t1 where false));
select column1 from t1 where column1 in (select distinct column1 as other from t1 where false);

Expected behavior

Can use distinct in a subquery

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions