Skip to content

Conversation

@liamzwbao
Copy link
Contributor

@liamzwbao liamzwbao commented Aug 6, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Implement partition_statistics API for InterleaveExec

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Aug 6, 2025
@liamzwbao liamzwbao force-pushed the issue-15873-interleave branch from 9f4ddfe to e74a202 Compare August 27, 2025 22:27
@liamzwbao liamzwbao marked this pull request as ready for review August 27, 2025 22:28
@liamzwbao liamzwbao force-pushed the issue-15873-interleave branch from e74a202 to 58f51c7 Compare August 27, 2025 22:32
@liamzwbao
Copy link
Contributor Author

Hi @xudong963, this PR is ready for review

@xudong963
Copy link
Member

@liamzwbao Sorry, I missed the PR, will review later.

Comment on lines +443 to +444
assert_eq!(partition_row_counts[0], 2);
assert_eq!(partition_row_counts[1], 6);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the real stat outputs are different from expected_stats, could you please help me figure out what results in the difference?

Copy link
Contributor Author

@liamzwbao liamzwbao Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that hash repartitioning doesn’t always distribute results evenly, which is expected and should have only a minor impact when the underlying dataset is large enough. I added a test for repartition to confirm this behavior. Note that the stats in repartition are marked as Inexact because the partitioning algorithm does not guarantee balanced output, and Interleave simply converges results from the child partitions.

Sorry for the late reply, I was out for the past 2 weeks

@liamzwbao liamzwbao requested a review from xudong963 September 25, 2025 13:24
Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THank you @liamzwbao

@xudong963 xudong963 added this pull request to the merge queue Oct 2, 2025
Merged via the queue into apache:main with commit 9611ac8 Oct 2, 2025
28 checks passed
@liamzwbao liamzwbao deleted the issue-15873-interleave branch October 15, 2025 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants