-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is needed
Description
Describe the bug
# create table
COPY (SELECT * FROM values (1, 'a'), (2, 'b') t(int_col, str_col)) to 'test_files/scratch/table/1.parquet';
COPY (SELECT * FROM values ('c', 3), ('d', -1) t(str_col, int_col)) to 'test_files/scratch/table/2.parquet';
create external table t stored as parquet location 'test_files/scratch/table';
set datafusion.explain.format = indent;
# No filter pushdown in datasource:
set datafusion.execution.parquet.pushdown_filters = false; (default false, I guess that's why exising tests aren't triggered)
# No SortExec
explain select * from t where t.int_col = 2 order by int_col;
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Sort: t.int_col ASC NULLS LAST |
| | Filter: t.int_col = Int64(2) |
| | TableScan: t projection=[int_col, str_col], partial_filters=[t.int_col = Int64(2)] |
| physical_plan | CoalescePartitionsExec |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | FilterExec: int_col@0 = 2 |
| | RepartitionExec: partitioning=RoundRobinBatch(12), input_partitions=2 |
| | DataSourceExec: file_groups={2 groups: [[Users/xudong/opensource/datafusion/test_files/scratch/table/1.parquet], [Users/xudong/opensource/datafusion/test_files/scratch/table/2.parquet]]}, projection=[int_col, str_col], file_type=parquet, predicate=int_col@0 = 2, pruning_predicate=int_col_null_count@2 != row_count@3 AND int_col_min@0 <= 2 AND 2 <= int_col_max@1, required_guarantees=[int_col in (2)] |
| | |
| | |
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# Enable fitler push in datasource
set datafusion.execution.parquet.pushdown_filters = true;
# SortExec isn't removed, because the equivalence info from filter misses
explain select * from t where t.int_col = 2 order by int_col;
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Sort: t.int_col ASC NULLS LAST |
| | Filter: t.int_col = Int64(2) |
| | TableScan: t projection=[int_col, str_col], partial_filters=[t.int_col = Int64(2)] |
| physical_plan | SortPreservingMergeExec: [int_col@0 ASC NULLS LAST] |
| | SortExec: expr=[int_col@0 ASC NULLS LAST], preserve_partitioning=[true] |
| | DataSourceExec: file_groups={2 groups: [[Users/xudong/opensource/datafusion/test_files/scratch/table/1.parquet], [Users/xudong/opensource/datafusion/test_files/scratch/table/2.parquet]]}, projection=[int_col, str_col], file_type=parquet, predicate=int_col@0 = 2, pruning_predicate=int_col_null_count@2 != row_count@3 AND int_col_min@0 <= 2 AND 2 <= int_col_max@1, required_guarantees=[int_col in (2)] |
| | |
| | |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
To Reproduce
See above
Expected behavior
The SortExec in the case should be removed
Additional context
DF47 works.
The bug is found during upgrading DF48.
I may not have time to fix it until next next week, if anyone is interested in it, feel free to pick.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is needed