Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql result discrepency with sqlite, postgres and duckdb #13780

Closed
Tracked by #13811
Omega359 opened this issue Dec 14, 2024 · 6 comments
Closed
Tracked by #13811

sql result discrepency with sqlite, postgres and duckdb #13780

Omega359 opened this issue Dec 14, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@Omega359
Copy link
Contributor

Omega359 commented Dec 14, 2024

Describe the bug

External error: query result mismatch:
[SQL] SELECT - NULLIF ( + 15, - 27 + + CAST ( NULL AS REAL ) + + + MIN ( 35 ) * - COUNT ( * ) * - 14 + 64 ) * 87 * + 75 * 34 * + 76 + - - 31 AS col0, - 10 * - + 33 * ( + - NULLIF ( 21, + 70 * + 71 ) ) * - 15
at test_files/sqlite/random/expr/slt_good_11.slt:9790

SELECT - NULLIF ( + 15, - 27 + + CAST ( NULL AS REAL ) + + + MIN ( 35 ) * - COUNT ( * ) * - 14 + 64 ) * 87 * + 75 * 34 * + 76 + - - 31 AS col0, - 10 * - + 33 * ( + - NULLIF ( 21, + 70 * + 71 ) ) * - 15;

df result:

> SELECT - NULLIF ( + 15, - 27 + + CAST ( NULL AS REAL ) + + + MIN ( 35 ) * - COUNT ( * ) * - 14 + 64 ) * 87 * + 75 * 34 * + 76 + - - 31 AS col0, - 10 * - + 33 * ( + - NULLIF ( 21, + 70 * + 71 ) ) * - 15
;
+--------------+---------------------------------------------------------------------------------------+
| col0         | Int64(-10) * (- Int64(33)) * (- nullif(Int64(21),Int64(70) * Int64(71))) * Int64(-15) |
+--------------+---------------------------------------------------------------------------------------+
| -252908960.0 | 103950                                                                                |
+--------------+---------------------------------------------------------------------------------------+

postgres, duckdb and sqlite result:

D SELECT - NULLIF ( + 15, - 27 + + CAST ( NULL AS REAL ) + + + MIN ( 35 ) * - COUNT ( * ) * - 14 + 64 ) * 87 * + 75 * 34 * + 76 + - - 31 AS col0, - 10 * - + 33 * ( + - NULLIF ( 21, + 70 * + 71 ) ) * - 15;
┌────────────┬──────────────────────────────────────────────────────────────────┐
│    col0    │ (((-10 * -(+(33))) * +(-("nullif"(21, (+(70) * +(71)))))) * -15) │
│   int32    │                              int32                               │
├────────────┼──────────────────────────────────────────────────────────────────┤
│ -252908969103950 │
└────────────┴──────────────────────────────────────────────────────────────────┘

Duckdb and sqlite have col0 as an integer, pg and df have it as a float.

To Reproduce

sql above.

Expected behavior

No response

Additional context

No response

@Omega359
Copy link
Contributor Author

Omega359 commented Jan 5, 2025

The core issue is something related to the cast to the REAL type. Changing the sql to cast to DOUBLE results in the correct results:

> SELECT - NULLIF ( + 15, - 27 + + CAST ( NULL AS DOUBLE ) + + + MIN ( 35 ) * - COUNT ( * ) * - 14 + 64 ) * 87 * + 75 * 34 * + 76 + - - 31 AS col0, - 10 * - + 33 * ( + - NULLIF ( 21, + 70 * + 71 ) ) * - 15;
+--------------+---------------------------------------------------------------------------------------+
| col0         | Int64(-10) * (- Int64(33)) * (- nullif(Int64(21),Int64(70) * Int64(71))) * Int64(-15) |
+--------------+---------------------------------------------------------------------------------------+
| -252908969.0 | 103950                                                                                |
+--------------+---------------------------------------------------------------------------------------+

@Omega359
Copy link
Contributor Author

Omega359 commented Jan 6, 2025

I've spent a few hours trying to determine where DataFusion was going wrong with the math and finally came up with a condensed test case. The math is fine ... as far as Rust is concerned. You can see a boiled down test case I wrote up on the rust playground.

The core issue is that the result cannot be represented as a f32 which is what the REAL type maps to. So technically the result that DataFusion is returning is correct.

Sqlite maps real to 8 bytes.
Postgresql maps real to 4 bytes ... but for the query in query it uses an int type (proper handling of nullif types as opposed to DataFusion's incorrect type selection). Even if you force part of the query to be a real type it still uses int:

SELECT - 15.0000::float4 * 16860600 + 31 AS col0; => -252908969
SELECT - 15.0001::float4 * 16860600 + 31 AS col0; => -252910657.34972382

Duckdb maps real to 4 bytes. If you force the type to be float4 it returns the same result as DataFusion:

D SELECT - 15.0000::float4 * 16860600 + 31 AS col0;
┌──────────────┐
│     col0     │
│    float     │
├──────────────┤
│ -252908960.0 │
└──────────────┘

So I think for these queries the resolution here should be that DataFusion should fix the typing of nullif .. and then update the results in the slt test if the type maps to f32.

@Omega359
Copy link
Contributor Author

Omega359 commented Jan 6, 2025

Addendum: Since the sqlite tests come from sqlite (duh) where REAL is mapped to 8 bytes (Double/f64) I would like to propose that I update the sqlite .slt files and change:

'AS REAL' -> 'AS DOUBLE'

This would better match the actual types being tested and would fix many of the failing results. Along with correcting the nullif type behavior would fix almost all the remaining tests that have result mismatches with sqlite/postgresql.

Thoughts? @alamb, @aweltsch, @2010YOUY01, @jayzhan-synnada

@alamb
Copy link
Contributor

alamb commented Jan 7, 2025

First of all, very nice 🕵️ work !

'AS REAL' -> 'AS DOUBLE'

This would better match the actual types being tested and would fix many of the failing results. Along with correcting the nullif type behavior would fix almost all the remaining tests that have result mismatches with sqlite/postgresql.

As I understand, this means using f64 for floating point operations in the test. This sounds like a very good and pragmatic thing to do in my mind

The other potential thing to do might simply be to map AS REAL to use DataType::Float64 but that is a larger and much more potentially disruptive change

Thanks agian @Omega359

@Omega359
Copy link
Contributor Author

I went with using FLOAT8 vs DOUBLE as double isn't a valid type name in Postgresql and it interfered with the result comparisons.

@Omega359
Copy link
Contributor Author

Resolved as fixed with change 'as REAL' to 'AS FLOAT8' in sqlite test files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants