Skip to content

Internal error when parquet predicate pushdown is enabled "Error evaluating filter predicate:" #4006

Closed
@alamb

Description

@alamb

Describe the bug
DataFusion generates an error for some predicates when predicate pushdown is enabled

NOTE that pushdown filtering is not enabled by default (as we are still working on it) so this issue will not likely affect users:

To Reproduce

  1. Download data from
    repro.zip
  2. Run datafusion CLI

The query run is

select count(*) from foo where request_duration_ns > 791684060 OR client_addr NOT in ('213.120.214.213');

Expected behavior
Same answer should be produced with and without row filtering enabled. However, with row filtering an error is produced

datafusion-cli -f script.sql 
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 53819           |
+-----------------+
1 row in set. Query took 0.006 seconds.

With it enabled:

DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true datafusion-cli -f script.sql 
...
1 row in set. Query took 0.002 seconds.
ArrowError(ExternalError(Execution("Arrow error: External error: Arrow: underlying Arrow error: Compute error: Error evaluating filter predicate: Internal(\"Cannot evaluate binary expression Gt with types Utf8 and Int32\")")))

Additional context
Found by the test here #3976

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions