Skip to content

Commit 222205d

Browse files
authored
datafusion.optimizer.repartition_file_scans enabled by default (#5295)
1 parent cfbb14d commit 222205d

File tree

3 files changed

+3
-3
lines changed

3 files changed

+3
-3
lines changed

datafusion/common/src/config.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -284,7 +284,7 @@ config_namespace! {
284284
/// Currently supported only for Parquet format in which case
285285
/// multiple row groups from the same file may be read concurrently. If false then each
286286
/// row group is read serially, though different files may be read in parallel.
287-
pub repartition_file_scans: bool, default = false
287+
pub repartition_file_scans: bool, default = true
288288

289289
/// Should DataFusion repartition data using the partitions keys to execute window
290290
/// functions in parallel using the provided `target_partitions` level

datafusion/core/tests/sqllogictests/test_files/information_schema.slt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ datafusion.optimizer.max_passes 3
132132
datafusion.optimizer.prefer_hash_join true
133133
datafusion.optimizer.repartition_aggregations true
134134
datafusion.optimizer.repartition_file_min_size 10485760
135-
datafusion.optimizer.repartition_file_scans false
135+
datafusion.optimizer.repartition_file_scans true
136136
datafusion.optimizer.repartition_joins true
137137
datafusion.optimizer.repartition_sorts true
138138
datafusion.optimizer.repartition_windows true

docs/source/user-guide/configs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ Environment variables are read during `SessionConfig` initialisation so they mus
6060
| datafusion.optimizer.repartition_aggregations | true | Should DataFusion repartition data using the aggregate keys to execute aggregates in parallel using the provided `target_partitions` level |
6161
| datafusion.optimizer.repartition_file_min_size | 10485760 | Minimum total files size in bytes to perform file scan repartitioning. |
6262
| datafusion.optimizer.repartition_joins | true | Should DataFusion repartition data using the join keys to execute joins in parallel using the provided `target_partitions` level |
63-
| datafusion.optimizer.repartition_file_scans | false | When set to true, file groups will be repartitioned to achieve maximum parallelism. Currently supported only for Parquet format in which case multiple row groups from the same file may be read concurrently. If false then each row group is read serially, though different files may be read in parallel. |
63+
| datafusion.optimizer.repartition_file_scans | true | When set to true, file groups will be repartitioned to achieve maximum parallelism. Currently supported only for Parquet format in which case multiple row groups from the same file may be read concurrently. If false then each row group is read serially, though different files may be read in parallel. |
6464
| datafusion.optimizer.repartition_windows | true | Should DataFusion repartition data using the partitions keys to execute window functions in parallel using the provided `target_partitions` level |
6565
| datafusion.optimizer.repartition_sorts | true | Should DataFusion execute sorts in a per-partition fashion and merge afterwards instead of coalescing first and sorting globally. With this flag is enabled, plans in the form below "SortExec: [a@0 ASC]", " CoalescePartitionsExec", " RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1", would turn into the plan below which performs better in multithreaded environments "SortPreservingMergeExec: [a@0 ASC]", " SortExec: [a@0 ASC]", " RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1", |
6666
| datafusion.optimizer.skip_failed_rules | true | When set to true, the logical plan optimizer will produce warning messages if any optimization rules produce errors and then proceed to the next rule. When set to false, any rules that produce errors will cause the query to fail |

0 commit comments

Comments
 (0)