Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decorrelation rewrite doesn't respect identifier normalizer setting? #12422

Open
paf31 opened this issue Sep 10, 2024 · 1 comment
Open

Decorrelation rewrite doesn't respect identifier normalizer setting? #12422

paf31 opened this issue Sep 10, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@paf31
Copy link

paf31 commented Sep 10, 2024

Describe the bug

The result of a query which relies on the decorrelation optimization seems to depend on the DATAFUSION_SQL_PARSER_ENABLE_IDENT_NORMALIZATION setting: if it is on, the query works as expected. If not, I get a field-not-in-scope error:

Schema error: No field named __scalar_sq_1.invoiceid. Valid fields are "Invoice"."invoiceId".

To Reproduce

DATAFUSION_SQL_PARSER_ENABLE_IDENT_NORMALIZATION=false datafusion-cli
SELECT
    *,
    (
        SELECT
            count(1)
        FROM
        VALUES
            (1),
            (1),
            (2) AS InvoiceLine(invoiceId)
        WHERE
            InvoiceLine.invoiceId = Invoice.invoiceId
    )
FROM
VALUES
    (1),
    (2),
    (3) AS Invoice(invoiceId);

Expected behavior

+------------+-----------------+
| invoice_id | count(Int64(1)) |
+------------+-----------------+
| 1          | 2               |
| 2          | 1               |
| 3          | 0               |
+------------+-----------------+

Additional context

Error disappears if DATAFUSION_SQL_PARSER_ENABLE_IDENT_NORMALIZATION flag is turned on.

Error disappears if I manually snake-case all idents in the query.

The plan illustrates the issue:

Projection: Invoice.invoiceId, CASE WHEN __scalar_sq_1.__always_true IS NULL THEN Int64(0) ELSE __scalar_sq_1.count(Int64(1)) END AS count(Int64(1))
  Left Join:  Filter: __scalar_sq_1.invoiceid = Invoice.invoiceId
    SubqueryAlias: Invoice
      Projection: column1 AS invoiceId
        Values: (Int64(1)), (Int64(2)), (Int64(3))
    SubqueryAlias: __scalar_sq_1
      Projection: count(Int64(1)), Boolean(true) AS __always_true

==================================^ InvoiceLine.invoiceId is missing here

        Aggregate: groupBy=[[InvoiceLine.invoiceId]], aggr=[[count(Int64(1))]]
          SubqueryAlias: InvoiceLine
            Projection: column1 AS invoiceId
              Values: (Int64(1)), (Int64(1)), (Int64(2))
@paf31 paf31 added the bug Something isn't working label Sep 10, 2024
@JasonLi-cn
Copy link
Contributor

The main branch has fixed this issue. I guess it has something to do with this PR: #12426

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants