You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks
### What changes were proposed in this pull request?
Fix a problems which can lead to data correctness after part blocks retry in `OneForOneBlockFetcher` when use `FetchShuffleBlocks` .
### Why are the changes needed?
This is a data correctness bug, It's is no problems when use old protocol to send `OpenBlocks` before fetch chunks in `OneForOneBlockFetcher`;
In latest branch, `OpenBlocks` has been replaced to `FetchShuffleBlocks`. Howerver, `FetchShuffleBlocks` read shuffle blocks order is not the same as `blockIds` in `OneForOneBlockFetcher`; the `blockIds` is used to match blockId with shuffle data with index, now it is out of order;
It will lead to read wrong block chunk when some blocks fetch failed in `OneForOneBlockFetcher`, it will retry the rest of the blocks in `blockIds` based on the `blockIds`'s order.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closesapache#31643 from seayoun/yuhaiyang_fix_use_FetchShuffleBlocks_order.
Lead-authored-by: yuhaiyang <yuhaiyang@yuhaiyangs-MacBook-Pro.local>
Co-authored-by: yuhaiyang <yuhaiyang@172.19.25.126>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
0 commit comments