Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix datatype of case expression #5734

Merged
merged 1 commit into from
Apr 1, 2023
Merged

Fix datatype of case expression #5734

merged 1 commit into from
Apr 1, 2023

Conversation

mslapek
Copy link
Contributor

@mslapek mslapek commented Mar 25, 2023

Which issue does this PR close?

Fixes #5733

Rationale for this change

Fixes the issue.

What changes are included in this PR?

Make Expr::schema return coerced DataType for Expr::Case.

Are these changes tested?

Added pg_compat_null.slt reproducing the issue.

Are there any user-facing changes?

No, except the bug fix.

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Mar 25, 2023
@@ -68,7 +69,26 @@ impl ExprSchemable for Expr {
Expr::OuterReferenceColumn(ty, _) => Ok(ty.clone()),
Expr::ScalarVariable(ty, _) => Ok(ty.clone()),
Expr::Literal(l) => Ok(l.get_datatype()),
Expr::Case(case) => case.when_then_expr[0].1.get_type(schema),
Expr::Case(case) => {
// when #5681 will be fixed, this code can be reverted to:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving TypeCoersion to analyzer to fix #5733 instead of making a temporary workaround?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's a duck-taping.

I'm afraid the mentioned issue won't be resolved quickly by cut'n'paste... (see #5681 (comment) )

So the question is - should we expose our users to the bug 🧐 until #5681 is fixed? (putting Datafusion's reputation at risk?)

What is more, this PR features sqlogictest ⭐️, so we won't have a regression caused by the fix #5681.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be best to have this code in Analyzer -- but if we can't get it in there quickly we can merge this PR as is.

Thank you for the test case @mslapek

Thank you for leaving the link to #5733 in the code

Do you know why the CREATE TABLE AS SELECT does not have type coercion (as an optimizer pass) applied? It seems like that may be an oversight (and maybe the root cause of this problem)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type coercion is implemented as an optimizer - and it changes schemas of particular Expressions.
And it takes care of Expr::Case.

But Expr::Projection caches the schema in itself - so it is not updated! Therefore we get a mismatch between declared schema vs batch schema.

Essentially this PR makes Expr::Case to not change a schema during the coercion optimizer.

Copy link
Contributor

@alamb alamb Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #5821

PR to add reference #5822

FROM aggregate_test_100_by_sql


query III
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly concerned if this test is actually needed, the PR is to fix schema mismatch on case statement

Copy link
Contributor Author

@mslapek mslapek Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test involves CASE statement, and it was crashing before the fix (the CREATE TABLE one).

The file itself might be useful for other null-related bugfixes.

Nevertheless there could exist better ways to test this... Suggestions welcome!

@alamb
Copy link
Contributor

alamb commented Mar 30, 2023

Sorry for the delay -- what I have been (secretly) hoping is that we can move the type coercion into an analyzer pass and avoid the need for this workaround.

@alamb
Copy link
Contributor

alamb commented Apr 1, 2023

Here is what I think we should do to move this PR forward: I will file a ticket to remove this code from here (and into the analysis phase), and then we'll merge it in and we can do the cleanup with the analysis phase as a follow on

@alamb
Copy link
Contributor

alamb commented Apr 1, 2023

FYI I tried the test case with the fix from @Jefffrey in #5820 and sadly it still fails

I am merging this PR even though the code is non ideal -- my rationale is that it fixes a real bug, and has test coverage (with postgres verification). I have filed #5821 to fix it for real and will add that as a comment as a follow on PR

@alamb alamb merged commit bff92c8 into apache:main Apr 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Case statement with one branch being null gives "Mismatch between schema and batches"
4 participants