Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precedence bug with date comparison to date plus interval #3408

Closed
Tracked by #822 ...
andygrove opened this issue Sep 8, 2022 · 7 comments · Fixed by #4569
Closed
Tracked by #822 ...

Precedence bug with date comparison to date plus interval #3408

andygrove opened this issue Sep 8, 2022 · 7 comments · Fixed by #4569
Assignees
Labels
bug Something isn't working

Comments

@andygrove
Copy link
Member

Describe the bug

d3.d_date > d1.d_date + INTERVAL '5 days' fails with Int64 AND Boolean' can't be evaluated because there isn't a common type to coerce the types to

d3.d_date > (d1.d_date + INTERVAL '5 days') works

To Reproduce
See above

Expected behavior
None

Additional context
None

@Dandandan
Copy link
Contributor

Seems this is an issue in sqlparser-rs, namely in the values in for + and > used in get_next_precedence.

@sarahyurick
Copy link
Contributor

If no one has picked this up yet, I would be happy to work on this.

@Dandandan
Copy link
Contributor

Dandandan commented Oct 11, 2022

If no one has picked this up yet, I would be happy to work on this.

I don't believe so, so go ahead. As mentioned in the comment, I believe it is a bug in sqlparser (in precedence values).

@sarahyurick
Copy link
Contributor

I'm not really able to reproduce the error on the DataFusion side. For example:

use datafusion::prelude::*;

#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
  let ctx = SessionContext::new();

  let schema = datafusion::arrow::datatypes::Schema::new(vec![
    datafusion::arrow::datatypes::Field::new("i_item_desc", datafusion::arrow::datatypes::DataType::Utf8, true),
    datafusion::arrow::datatypes::Field::new("d3_date", datafusion::arrow::datatypes::DataType::Date64, true),
    datafusion::arrow::datatypes::Field::new("d1_date", datafusion::arrow::datatypes::DataType::Date64, true),
  ]);

  ctx.register_csv(
    "test", "data.csv", CsvReadOptions::default().schema(&schema)
  ).await?;

  let df = ctx.sql("select i_item_desc \
        from test \
        where d3_date > d1_date + INTERVAL '5 days'").await?;

  df.show().await?;
  Ok(())
}

where the contents of data.csv are:

i_item_desc,d3_date,d1_date
a,2022-12-12T7:7:7,2022-12-12T7:7:7
b,2022-12-12T7:7:7,2022-12-11T7:7:7
c,2022-12-12T7:7:7,2022-12-10T7:7:7
d,2022-12-12T7:7:7,2022-12-9T7:7:7
e,2022-12-12T7:7:7,2022-12-8T7:7:7
f,2022-12-12T7:7:7,2022-12-7T7:7:7
g,2022-12-12T7:7:7,2022-12-6T7:7:7
h,2022-12-12T7:7:7,2022-12-5T7:7:7

Produces the expected output without any errors. I'm able to get the correct results on the Dask-SQL side, too:

import pandas as pd
from dask_sql import Context
from datetime import datetime

c = Context()

test = pd.DataFrame(
    {
        "i_item_desc": ["a", "b", "c", "d", "e", "f", "g", "h"],
        "d3_date": [
            datetime(2002, 6, 5),
            datetime(2002, 6, 6),
            datetime(2002, 6, 7),
            datetime(2002, 6, 8),
            datetime(2002, 6, 9),
            datetime(2002, 6, 10),
            datetime(2002, 6, 11),
            datetime(2002, 6, 12),
        ],
        "d1_date": [
            datetime(2002, 6, 5),
            datetime(2002, 6, 5),
            datetime(2002, 6, 5),
            datetime(2002, 6, 5),
            datetime(2002, 6, 5),
            datetime(2002, 6, 5),
            datetime(2002, 6, 5),
            datetime(2002, 6, 5),
        ],
    }
)
c.create_table("test", test)

c.sql("select i_item_desc \
        from test \
        where d3_date > d1_date + INTERVAL '5 days'").compute()

Is there maybe another example I should try?

@sarahyurick
Copy link
Contributor

sarahyurick commented Oct 20, 2022

I get an error with: select i_item_desc from test where d3_date > d1_date + INTERVAL '5 days' and d2_date > d1_date + INTERVAL '3 days' that's due to incorrectly parsing the and operator, so I could open a separate issue for that.

Error: NotImplemented("Unsupported interval argument. Expected string literal, got: BinaryOp { left: Value(SingleQuotedString(\"5 days\")), op: And, right: BinaryOp { left: Identifier(Ident { value: \"d2_date\", quote_style: None }), op: Gt, right: BinaryOp { left: Identifier(Ident { value: \"d1_date\", quote_style: None }), op: Plus, right: Interval { value: Value(SingleQuotedString(\"3 days\")), leading_field: None, leading_precision: None, last_field: None, fractional_seconds_precision: None } } } }")

@andygrove
Copy link
Member Author

Thanks @sarahyurick. Perhaps the original issue already got resolved in sqlparser. It makes sense to me to open a new issue for the issue you found.

@alamb
Copy link
Contributor

alamb commented Dec 12, 2022

This is fixed -- I have a PR open to add some regression test coverage: #4569

@alamb alamb closed this as completed Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants