Add reproducer for consecutive RepartitionExec by NGA-TRAN · Pull Request #18343 · apache/datafusion

NGA-TRAN · 2025-10-28T19:29:15Z

Reproducer for #18341

NGA-TRAN · 2025-10-28T19:45:36Z

@gene-bordegaray : can you help review this?

gene-bordegaray

This makes sense. Just want to ask clarifying question to make sure I am understanding:

the Round Robin Repartition into the Aggregate is useful because we can disperse work across partitions and then accumulate their results. Using the aggregated results we can use the Hash Repartition to hand off work with the same key (such as env = 'prod') to workers, thus is more efficient
the parquet query is not working this way as the Reparititons are not separated by the Aggregate. The Aggregate does all this work on a single partition then does Repartitioning too late.

alamb · 2025-10-29T12:37:20Z

This makes sense. Just want to ask clarifying question to make sure I am understanding:

the Round Robin Repartition into the Aggregate is useful because we can disperse work across partitions and then accumulate their results. Using the aggregated results we can use the Hash Repartition to hand off work with the same key (such as env = 'prod') to workers, thus is more efficient

Yes -- you can read more about the two phase aggregation in general here https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.Accumulator.html#tymethod.state

NGA-TRAN · 2025-10-29T13:01:09Z

the parquet query is not working this way as the Reparititons are not separated by the Aggregate. The Aggregate does all this work on a single partition then does Repartitioning too late.

Correct.

And the ticket of this reproducer #18341 proposes 3 different ways to improve it

gene-bordegaray · 2025-10-29T13:44:33Z

the parquet query is not working this way as the Reparititons are not separated by the Aggregate. The Aggregate does all this work on a single partition then does Repartitioning too late.

Correct.

And the ticket of this reproducer #18341 proposes 3 different ways to improve it

Ok awesome, I will take some time to understand these approaches and give some of my thoughts

datafusion/sqllogictest/test_files/aggregate_repartition.slt

alamb · 2025-10-30T19:01:48Z

Thanks @NGA-TRAN @gene-bordegaray and @Dandandan

Reproducer for apache#18341

Add reproducer for consecutive RepartitionExec

d5a629f

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Oct 28, 2025

NGA-TRAN mentioned this pull request Oct 28, 2025

Avoid consecutive RepartitionExec #18341

Closed

gene-bordegaray approved these changes Oct 29, 2025

View reviewed changes

alamb approved these changes Oct 29, 2025

View reviewed changes

Dandandan reviewed Oct 29, 2025

View reviewed changes

datafusion/sqllogictest/test_files/aggregate_repartition.slt Show resolved Hide resolved

Dandandan approved these changes Oct 29, 2025

View reviewed changes

alamb added this pull request to the merge queue Oct 30, 2025

Merged via the queue into apache:main with commit f57da83 Oct 30, 2025
28 checks passed

tobixdev pushed a commit to tobixdev/datafusion that referenced this pull request Nov 2, 2025

Add reproducer for consecutive RepartitionExec (apache#18343)

05c0caf

Reproducer for apache#18341

codetyri0n pushed a commit to codetyri0n/datafusion that referenced this pull request Nov 11, 2025

Add reproducer for consecutive RepartitionExec (apache#18343)

a0f1d1d

Reproducer for apache#18341

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add reproducer for consecutive RepartitionExec#18343

Add reproducer for consecutive RepartitionExec#18343
alamb merged 1 commit intoapache:mainfrom
NGA-TRAN:ntran/agg

NGA-TRAN commented Oct 28, 2025

Uh oh!

NGA-TRAN commented Oct 28, 2025

Uh oh!

gene-bordegaray left a comment •

edited

Loading

Uh oh!

alamb commented Oct 29, 2025

Uh oh!

NGA-TRAN commented Oct 29, 2025

Uh oh!

gene-bordegaray commented Oct 29, 2025

Uh oh!

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

NGA-TRAN commented Oct 28, 2025

Uh oh!

NGA-TRAN commented Oct 28, 2025

Uh oh!

gene-bordegaray left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 29, 2025

Uh oh!

NGA-TRAN commented Oct 29, 2025

Uh oh!

gene-bordegaray commented Oct 29, 2025

Uh oh!

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gene-bordegaray left a comment •

edited

Loading