-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
ABS(col - x) in WHERE clause of query sometimes does not filter results correctly. The test file has a float column with high-precision values. We want to use ABS(col - x) < y to do equality comparisons for high-precision float columns.
To Reproduce
datafusion cli example using test parquet file attached:
➜ arrow-datafusion git:(master) ✗ cargo run --bin datafusion-cli
Finished dev [unoptimized + debuginfo] target(s) in 0.22s
Running `target/debug/datafusion-cli`
> CREATE EXTERNAL TABLE foo STORED AS PARQUET LOCATION 'test.parquet';
0 rows in set. Query took 0.001 seconds.
> select * from foo;
+--------------------+
| c0 |
+--------------------+
| 107.0090813093981 |
| 125.51519138755981 |
| 141.83342587451415 |
| 113.65534481251639 |
| 251.10794896957802 |
| 112.08361695028363 |
+--------------------+
6 rows in set. Query took 0.006 seconds.
> select c0, ABS(c0 - 251.10794896957802) from foo;
+--------------------+-------------------------------------------+
| c0 | abs(c0 Minus Float64(251.10794896957802)) |
+--------------------+-------------------------------------------+
| 107.0090813093981 | 144.0988676601799 |
| 125.51519138755981 | 125.59275758201821 |
| 141.83342587451415 | 109.27452309506387 |
| 113.65534481251639 | 137.45260415706161 |
| 251.10794896957802 | 0 |
| 112.08361695028363 | 139.02433201929438 |
+--------------------+-------------------------------------------+
6 rows in set. Query took 0.006 seconds.
> select c0, ABS(c0 - 251.10794896957802) from foo where ABS(c0 - 251.10794896957802) < 1;
0 rows in set. Query took 0.005 seconds.
> select c0, ABS(c0 - 251.10794896957802) from foo where ABS(c0 - 251.10794896957802) < 111;
0 rows in set. Query took 0.003 seconds.
> select c0, ABS(c0 - 251.10794896957802) from foo where ABS(c0 - 251.10794896957802) < 150;
+--------------------+-------------------------------------------+
| c0 | abs(c0 Minus Float64(251.10794896957802)) |
+--------------------+-------------------------------------------+
| 107.0090813093981 | 144.0988676601799 |
| 125.51519138755981 | 125.59275758201821 |
| 141.83342587451415 | 109.27452309506387 |
| 113.65534481251639 | 137.45260415706161 |
| 251.10794896957802 | 0 |
| 112.08361695028363 | 139.02433201929438 |
+--------------------+-------------------------------------------+
6 rows in set. Query took 0.007 seconds.
Expected behavior
The query
> select c0, ABS(c0 - 251.10794896957802) from foo where ABS(c0 - 251.10794896957802) < 1;
was expected to give the following 1 row:
+--------------------+-------------------------------------------+
| c0 | abs(c0 Minus Float64(251.10794896957802)) |
+--------------------+-------------------------------------------+
| 251.10794896957802 | 0 |
And the query
> select c0, ABS(c0 - 251.10794896957802) from foo where ABS(c0 - 251.10794896957802) < 111;
was expected to give the following 2 rows:
+--------------------+-------------------------------------------+
| c0 | abs(c0 Minus Float64(251.10794896957802)) |
+--------------------+-------------------------------------------+
| 251.10794896957802 | 0 |
| 141.83342587451415 | 109.27452309506387 |
Additional context
Test parquet: test.parquet.zip
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working