-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#1373][FOLLOWUP] fix(spark): register with incorrect partitionRanges after reassign #1612
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1612 +/- ##
============================================
+ Coverage 53.92% 54.83% +0.91%
Complexity 2864 2864
============================================
Files 438 418 -20
Lines 24912 22559 -2353
Branches 2123 2123
============================================
- Hits 13433 12371 -1062
+ Misses 10636 9416 -1220
+ Partials 843 772 -71 ☔ View full report in Codecov by Sentry. |
#1609 has been merged to solve the similar problems. please see it in advance. BTW, the detailed description should be filled. |
proto/src/main/proto/Rss.proto
Outdated
@@ -592,6 +592,8 @@ message RssReassignFaultyShuffleServerRequest{ | |||
int32 shuffleId = 1; | |||
repeated string partitionIds = 2; | |||
string faultyShuffleServerId = 3; | |||
int32 stageId = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zuston I have some concern about this. Will it affect the reuse of exchange, partial tasks execution of the shuffle and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the description, I didn't see any requirements for this PR, but I guess it wants to support different reassginment servers for different stage attempt.
For me, currently there is no need to support above stage attempt
… range error when reassign faulty shuffle server for tasks
0fc351c
to
09c63ef
Compare
I think the |
but the failed partitionId will not always be 1, and writing to the new assigned server will fail. |
Please fix the spotless @dingshun3016 |
I have checked this in 761dedf. So merge this. Thanks @dingshun3016 |
This implementation is great |
What changes were proposed in this pull request?
fix partition id inconsistency when reassign new shuffle server
For example:
when writing data on node a1, the registered partition id is 1003.
a1 node fails,and reassign node b1 and register shuffle server b1,but partitionNumPerRange is 1.
when writing data to node b1, NO_REGISTER exception will be thrown
Why are the changes needed?
Fix: (#1373)
Does this PR introduce any user-facing change?
No.
How was this patch tested?
(Please test your changes, and provide instructions on how to test it: