-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-17198. RBF: fix bug of getRepresentativeQuorum when records have same dateModified #6096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@goiri Could you please help to review, thanks a lot |
...f/src/main/java/org/apache/hadoop/hdfs/server/federation/store/impl/MembershipStoreImpl.java
Outdated
Show resolved
Hide resolved
long dateModified = Time.now(); | ||
// Active - oldest | ||
MembershipState report = createRegistration( | ||
ns, nn, ROUTERS[1], FederationNamenodeServiceState.ACTIVE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tweak the spacing to fit checkstyle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@goiri The last "root in trunk failed." may not be a problem with my code, the penultimate one is success, if you have time, please help review again |
🎊 +1 overall
This message was automatically generated. |
… same dateModified (apache#6096)
Description of PR
In the original implementation, when each router reports nn status at different times, the nn status is the status reported by majority routers, for example:
router1 -> nn0:active dateModified:1
router2 -> nn0:active dateModified:2
router3 -> nn0:active dateModified:3
router0 -> nn0:standby dateModified:4
Then, the status of nn0 is active, because majority routers report that nn0 is active.
If majority routers report nn status at the same time, for example:
(record1) router1 -> nn0:active dateModified:1
(record2) router2 -> nn0:active dateModified:1
(record3) router3 -> nn0:active dateModified:1
(record4) router0 -> nn0:standbydateModified:2
Then the state of nn0 is standby, but We expect the status of nn0 is active
This bug is because the above record is put into the Treeset in the method getRepresentativeQuorum. Since record1,2,3 have the same dateModified, there will only be one record in the final treeset of this method, so this method thinks that this nn is standby, because record4 newer
see: https://issues.apache.org/jira/browse/HDFS-17198
How was this patch tested?
my unit test testRegistrationMajorityQuorumEqDateModified
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?