Skip to content

Add support for wider range of scalar subqueries #3725

Closed
@andygrove

Description

@andygrove

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Setup

$ export DATAFUSION_OPTIMIZER_SKIP_FAILED_RULES=false
$ echo "1,2" > test.csv
$ cd datafusion-cli
$ cargo run
❯ create external table test (a int, b int) stored as csv location 'test.csv';

Test

This query works:

❯ select * from test where a = (select max(a) from test t2);
+---+---+
| a | b |
+---+---+
| 1 | 2 |
+---+---+

These non-aggregate subqueries that return a single row do not work:

❯ select * from test where a = (select distinct a from test);
Internal("Optimizer rule 'scalar_subquery_to_join' failed due to unexpected error: scalar subqueries must have a projection at /home/andy/git/apache/arrow-datafusion/datafusion/optimizer/src/scalar_subquery_to_join.rs:185\ncaused by\nError during planning: Could not coerce into Projection! at /home/andy/git/apache/arrow-datafusion/datafusion/expr/src/logical_plan/plan.rs:1121")

❯ select * from test where a = (select a from test limit 1);
Internal("Optimizer rule 'scalar_subquery_to_join' failed due to unexpected error: scalar subqueries must have a projection at /home/andy/git/apache/arrow-datafusion/datafusion/optimizer/src/scalar_subquery_to_join.rs:185\ncaused by\nError during planning: Could not coerce into Projection! at /home/andy/git/apache/arrow-datafusion/datafusion/expr/src/logical_plan/plan.rs:1121")

In both cases, the subqueries do have a projection, but they are wrapped with a Distinct or Limit node.

This query fails with a different error. In this case there is a Filter wrapping the Projection.

❯ select * from test where a = (select a from test where a is not null);
Internal("Optimizer rule 'scalar_subquery_to_join' failed due to unexpected error: scalar subqueries must aggregate a value at /home/andy/git/apache/arrow-datafusion/datafusion/optimizer/src/scalar_subquery_to_join.rs:193\ncaused by\nError during planning: Could not coerce into Aggregate! at /home/andy/git/apache/arrow-datafusion/datafusion/expr/src/logical_plan/plan.rs:1368")

Describe the solution you'd like
Support these subqueries

Describe alternatives you've considered
None

Additional context
None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions