Closed
Description
Describe the bug
DataFusion generates an error for some predicates when predicate pushdown is enabled
NOTE that pushdown filtering is not enabled by default (as we are still working on it) so this issue will not likely affect users:
To Reproduce
- Download data from
repro.zip - Run datafusion CLI
The query run is
select count(*) from foo where request_duration_ns > 791684060 OR client_addr NOT in ('213.120.214.213');
Expected behavior
Same answer should be produced with and without row filtering enabled. However, with row filtering an error is produced
datafusion-cli -f script.sql
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 53819 |
+-----------------+
1 row in set. Query took 0.006 seconds.
With it enabled:
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true datafusion-cli -f script.sql
...
1 row in set. Query took 0.002 seconds.
ArrowError(ExternalError(Execution("Arrow error: External error: Arrow: underlying Arrow error: Compute error: Error evaluating filter predicate: Internal(\"Cannot evaluate binary expression Gt with types Utf8 and Int32\")")))
Additional context
Found by the test here #3976