Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove filters for NULL constants (in addition to false) #8689

Closed
alamb opened this issue Dec 30, 2023 · 0 comments · Fixed by #8700
Closed

Remove filters for NULL constants (in addition to false) #8689

alamb opened this issue Dec 30, 2023 · 0 comments · Fixed by #8700
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Dec 30, 2023

Is your feature request related to a problem or challenge?

DataFusion tries to avoid doing work when at all possible to improve query performance

Part of this work is to determine when filters can never be true and avoid doing work

For example

DataFusion CLI v34.0.0
❯ create table t(x int) as values (1), (2), (3);
0 rows in set. Query took 0.003 seconds.

When DataFusion sees a filter that can't be true it skips even scanning the data

❯ explain select x from t where false;
+---------------+---------------+
| plan_type     | plan          |
+---------------+---------------+
| logical_plan  | EmptyRelation |
| physical_plan | EmptyExec     |
|               |               |
+---------------+---------------+
2 rows in set. Query took 0.001 seconds.

However, it currently does not skip scanning if the filter is NULL (which also can't be true). Note the MemoryExec is still present:

❯ explain select x from t where null::bool;
+---------------+---------------------------------------------------+
| plan_type     | plan                                              |
+---------------+---------------------------------------------------+
| logical_plan  | Filter: Boolean(NULL)                             |
|               |   TableScan: t projection=[x]                     |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192       |
|               |   FilterExec: NULL                                |
|               |     MemoryExec: partitions=1, partition_sizes=[1] |
|               |                                                   |
+---------------+---------------------------------------------------+

Describe the solution you'd like

I would like to avoid scanning when the filter evaluates to NULL in addition to false (the second example above should not have a MemoryExec in it)

Describe alternatives you've considered

No response

Additional context

I think this is a good first issue that should be relatively simple to implement and would be a good introduction to DataFusion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant