-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Implement partition_statistics API for InterleaveExec
#17051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9f4ddfe to
e74a202
Compare
e74a202 to
58f51c7
Compare
|
Hi @xudong963, this PR is ready for review |
|
@liamzwbao Sorry, I missed the PR, will review later. |
| assert_eq!(partition_row_counts[0], 2); | ||
| assert_eq!(partition_row_counts[1], 6); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw the real stat outputs are different from expected_stats, could you please help me figure out what results in the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that hash repartitioning doesn’t always distribute results evenly, which is expected and should have only a minor impact when the underlying dataset is large enough. I added a test for repartition to confirm this behavior. Note that the stats in repartition are marked as Inexact because the partitioning algorithm does not guarantee balanced output, and Interleave simply converges results from the child partitions.
Sorry for the late reply, I was out for the past 2 weeks
xudong963
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THank you @liamzwbao
Which issue does this PR close?
partition_statisticsAPI for more operators #15873.Rationale for this change
What changes are included in this PR?
Implement
partition_statisticsAPI forInterleaveExecAre these changes tested?
Yes
Are there any user-facing changes?
No