Skip to content

Another Internal error when parquet predicate pushdown is enabled "Error evaluating filter predicate: #4046

Closed
@alamb

Description

@alamb

Describe the bug
DataFusion generates an error for some predicates when predicate pushdown is enabled.

NOTE This is the same symptom as reported on #4006 but with a different predicate

NOTE that pushdown filtering is not enabled by default (as we are still working on it) so this issue will not likely affect users:

To Reproduce

  1. Download data from repro.zip
  2. Run datafusion CLI

The query run is

select count(*) from foo where request_method != 'GET' OR response_status = 400 OR service = 'backend';

I tested is using master at 35f786b, which includes the fix for #4006 in 5cf090a

$ git status
Your branch is up to date with 'apache/master'.

nothing to commit, working tree clean
$ git rev-parse HEAD
5cf090a13391501c0ce7707ac7a1e50e18517b79

Expected behavior
Same answer should be produced with and without row filtering enabled. However, with row filtering an error is produced

datafusion-cli -f script.sql
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 53819           |
+-----------------+
1 row in set. Query took 0.006 seconds.

With it enabled:

DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true datafusion-cli -f script.sql
...
1 row in set. Query took 0.021 seconds.
ArrowError(ExternalError(Execution("Arrow error: External error: Arrow: underlying Arrow error: Compute error: Error evaluating filter predicate: Internal(\"Cannot evaluate binary expression NotEq with types UInt16 and Utf8\")")))

Additional context
Found by the test here #3976

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions