Skip to content

Conversation

@yjshen
Copy link
Member

@yjshen yjshen commented Jan 14, 2022

Which issue does this PR close?

Closes #1565 .

Rationale for this change

  1. batch_size is only execution runtime related, should consolidate all into RuntimeConfig.
  2. Rename PhysicalPlanConfig to FileScanConfig to make it clear.

What changes are included in this PR?

Are there any user-facing changes?

),
PartitionMethod::RoundRobin(batch_size) => {
Partitioning::RoundRobinBatch(batch_size as usize)
PartitionMethod::RoundRobin(partition_count) => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_size here is wrong, it's partition count

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh god, good find, 4096 partitions is not going to be fun.


for batch_size in test_batch_sizes.iter() {
let rr_repartition = Partitioning::RoundRobinBatch(*batch_size);
for partition_count in test_partition_counts.iter() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

let runtime = Arc::new(RuntimeEnv::default());
let projection = Some(vec![0, 1, 2, 3]);
let exec = get_exec("aggregate_test_100.csv", &projection, 1024, Some(1)).await?;
let exec = get_exec("aggregate_test_100.csv", &projection, Some(1)).await?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this PR, most of the magic number 1024 is removed from the project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea of putting batch_size on the RuntimeConfig is 🏅 👍

parquet_pruning: bool,
/// Runtime configurations such as memory threshold and local disk for spill
pub runtime_config: RuntimeConfig,
pub runtime: RuntimeConfig,
Copy link
Member Author

@yjshen yjshen Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to runtime since I think the word config is redundant. Consider the usage exec_config.runtime_config.batch_size for example. Everything in exec_config is config.

@yjshen yjshen marked this pull request as ready for review January 14, 2022 10:14
@alamb
Copy link
Contributor

alamb commented Jan 14, 2022

Thanks @yjshen -- I'll check this out over the weekend.

BTW are you planning on working on the next steps for #1526 (external sorting)? I would be interested in working on this area too, but I don't want to conflict with your plans.

I found #1526 (comment) -- sorry

@alamb alamb changed the title Consolidate configurations in ExecutionConfig, RuntimeConfig and PhysicalPlanConfig Consolidate batch_size configuration in ExecutionConfig, RuntimeConfig and PhysicalPlanConfig Jan 16, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a really nice improvement @yjshen -- thank you!

In fact it would probably be good to add some other common things to the RuntimeEnv that is currently plumbed through in other ways such as object_store

cc @rdettai who had some thoughts related to config on #1141

),
PartitionMethod::RoundRobin(batch_size) => {
Partitioning::RoundRobinBatch(batch_size as usize)
PartitionMethod::RoundRobin(partition_count) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find

let runtime = Arc::new(RuntimeEnv::default());
let projection = Some(vec![0, 1, 2, 3]);
let exec = get_exec("aggregate_test_100.csv", &projection, 1024, Some(1)).await?;
let exec = get_exec("aggregate_test_100.csv", &projection, Some(1)).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea of putting batch_size on the RuntimeConfig is 🏅 👍

@yjshen
Copy link
Member Author

yjshen commented Jan 18, 2022

Thanks @alamb , the main reason I'm not moving object_store into RuntimeEnv is that it's used both during query planning and query execution. What do you think?

I was trying to consolidate more of the configs from the three Config as the initial name for this PR shows. But after a round of review of all the configurations, I find only one batch_size weird. Do you have any other config in mind?

@alamb
Copy link
Contributor

alamb commented Jan 18, 2022

Do you have any other config in mind?

No, object_store was the one I had in mind; If anything else occurs to me I will let you know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consolidate various configurations options, remove unrelated batch_size

3 participants