Skip to content

Don't error on unknown column when pruning if predicate can still be proven false #7869

Open
@domodwyer

Description

@domodwyer

Is your feature request related to a problem or challenge?

At query time, our use case requires that we evaluate predicates against in-memory data that may have a schema that is a subset of the table schema. The predicate can reference columns that are not currently in memory or known at query time.

For example, given the following in-memory data:

col_a value
A 42

We may have to evaluate a predicate such as col_a != A AND col_b=bananas. Where col_b is not present in the in-memory schema / unknown at pruning time, but is a valid column for the table in the system as a whole.

Because at query time we have a limited subset of the schema, the schema and statistics provided when constructing the PruningPredicate covers only col_a, value.

However the col_a != A portion of the predicate can be proven FALSE irrespective of col_b. Unfortunately constructing the PruningPredicate eagerly validates the presence of statistics for all columns in the predicate, and errors stating that there are no fields named col_b before attempting to evaluate any portion of the predicate.

Describe the solution you'd like

Attempt to evaluate the predicate based on the available statistics, and return FALSE if possible. If the predicate cannot be proven FALSE, return a "missing column" error as it does today.

For the example above, ideally pruning should return FALSE as it can be proven that col_a != A is FALSE even though col_b is unknown at pruning time.

Describe alternatives you've considered

Inserting NULL statistics into the pruning schema to satisfy the presence check - this works around the issue, but unfortunately requires extra processing to prevent the missing field error.

Additional context

This change in behaviour might need sticking behind a flag/option to opt into, rather than being the default.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions