Closed
Description
Describe the bug
DataFusion generates an error for some predicates when predicate pushdown is enabled.
NOTE This is the same symptom as reported on #4006 but with a different predicate
NOTE that pushdown filtering is not enabled by default (as we are still working on it) so this issue will not likely affect users:
To Reproduce
- Download data from repro.zip
- Run datafusion CLI
The query run is
select count(*) from foo where request_method != 'GET' OR response_status = 400 OR service = 'backend';
I tested is using master at 35f786b, which includes the fix for #4006 in 5cf090a
$ git status
Your branch is up to date with 'apache/master'.
nothing to commit, working tree clean
$ git rev-parse HEAD
5cf090a13391501c0ce7707ac7a1e50e18517b79
Expected behavior
Same answer should be produced with and without row filtering enabled. However, with row filtering an error is produced
datafusion-cli -f script.sql
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 53819 |
+-----------------+
1 row in set. Query took 0.006 seconds.
With it enabled:
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true datafusion-cli -f script.sql
...
1 row in set. Query took 0.021 seconds.
ArrowError(ExternalError(Execution("Arrow error: External error: Arrow: underlying Arrow error: Compute error: Error evaluating filter predicate: Internal(\"Cannot evaluate binary expression NotEq with types UInt16 and Utf8\")")))
Additional context
Found by the test here #3976