Skip to content

[BLAZE-975] Fix duplicated shuffle data fetch under AQE Rebalance when using Uniffle in Blaze#976

Merged
richox merged 2 commits intoapache:masterfrom
work-space-station:blaze-975
May 9, 2025
Merged

[BLAZE-975] Fix duplicated shuffle data fetch under AQE Rebalance when using Uniffle in Blaze#976
richox merged 2 commits intoapache:masterfrom
work-space-station:blaze-975

Conversation

@merrily01
Copy link
Member

@merrily01 merrily01 commented Apr 29, 2025

Which issue does this PR close?

Closes #975 .

Rationale for this change

  • By passing startMapIndex and endMapIndex when invoking uniffleShuffleManager.getReader, the correct blocks bitmap is constructed to guarantee the shuffle read task fetches the proper data.

What changes are included in this PR?

  • Add the passing of startMapIndex and endMapIndex parameters when invoking uniffleShuffleManager.getReader

  • Add necessary log information for visibility

Are there any user-facing changes?

No

@merrily01
Copy link
Member Author

@richox @zuston PTAL

Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we needn't to retrieve the startMapIdx -> endMapIdx blocks bitmap. Just do like this:

val reader =
      uniffleShuffleManager.getReader(
        rssHandleWrapper.rssShuffleHandleInfo,
        startMapIndex,
        endMapIndex,
        startPartition,
        endPartition,
        context,
        metrics)

PTAL @merrily01

@merrily01
Copy link
Member Author

Maybe we needn't to retrieve the startMapIdx -> endMapIdx blocks bitmap. Just do like this:

val reader =
      uniffleShuffleManager.getReader(
        rssHandleWrapper.rssShuffleHandleInfo,
        startMapIndex,
        endMapIndex,
        startPartition,
        endPartition,
        context,
        metrics)

PTAL @merrily01

Well done! Indeed, the implemented logic should be used.
The code has been updated. Please kindly review. @zuston @richox

@zuston
Copy link
Member

zuston commented May 9, 2025

cc @richox again

@richox richox merged commit 12d04af into apache:master May 9, 2025
619 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect shuffle data fetch under AQE Rebalance when using Uniffle in Blaze causes data duplication

3 participants