Skip to content

Dev repartitioning#693

Merged
lihao712 merged 20 commits intomasterfrom
dev-repartitioning
Dec 17, 2024
Merged

Dev repartitioning#693
lihao712 merged 20 commits intomasterfrom
dev-repartitioning

Conversation

@gy11233
Copy link
Contributor

@gy11233 gy11233 commented Dec 12, 2024

Which issue does this PR close?

Add native RoundRobin repartition support.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

.setPartitionCount(numPartitions)
.addAllHashExpr(nativeHashExprs.asJava)
case RoundRobinPartitioning(_) =>
PhysicalHashRepartition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suggest using a new separated PhysicalRoundRobinRepartition other than reusing hash repartition here.

fn sort_batch_by_partition_id(
batch: RecordBatch,
partitioning: &Partitioning,
sum_num_rows: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe current_num_rows is a better name

let (parts, sorted_batch) = self
.sort_time
.with_timer(|| sort_batch_by_partition_id(batch, partitioning))?;
.with_timer(|| sort_batch_by_partition_id(batch, partitioning, self.num_rows))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can sort before updating self.num_rows so that we don't need to substract batch.num_rows() in the implementation.

@lihao712 lihao712 merged commit 9e459f8 into master Dec 17, 2024
@richox richox mentioned this pull request Apr 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants