-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-22414 Interruption of moving regions in RSGroup will cause regi… #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ons on wrong rs
🎊 +1 overall
This message was automatically generated. |
hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminServer.java
Outdated
Show resolved
Hide resolved
try { | ||
this.master.getAssignmentManager().move(region); | ||
}catch (IOException ioe){ | ||
LOG.error("Move region {} from group failed, will retry, current retry time is {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May get too verbose. How about log as debug on each retry, then if max number of reties has been reached, log as error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. But I think we need failed regions' names to move them to the target group servers after the failed call, because we can't call move tables or servers methods to move failed regions again.Maybe I can add a failed regions list in the ex message when max number of reties has been reached.What do you think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed this log level to DEBUG, and added a failed regions list in the ex message when max number of reties has been reached.
hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminServer.java
Outdated
Show resolved
Hide resolved
hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminServer.java
Show resolved
Hide resolved
@@ -459,4 +463,197 @@ public boolean evaluate() throws Exception { | |||
Assert.assertEquals(null, rsGroupAdmin.getRSGroupInfo(fooGroup.getName())); | |||
} | |||
|
|||
@Test | |||
public void testFailedMoveWhenMoveServer() throws Exception { | |||
final RSGroupInfo newGroup = addGroup(getGroupName(name.getMethodName()), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addGroup already invokes _ moveServerRegionsFromGroup_ indirectly, and adds an RS to it. We could simplify this test if we just create a new group from RSGroupAdminServer.addRSGroup, pick one of the RSes from assignMap, change one of its regions state, then call later call RSGroupAdminServer.moveServers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I have used rsGroupAdmin.addRSGroup to add a new group without adding server. Then I create a multi-region table, choose a region to change state, and record the server that the region we changed on. In the last call RSGroupAdminServer.moveServers to move the recorded server, and check if the error message contains the region name.
} | ||
|
||
@Test | ||
public void testFailedMoveWhenMoveTable() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as to the tested methods, we may apply some code reuse, given how similarly the two test structures are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I make a function named setARegionState to randomly choosing a region and set its state. The difference of the two UTs is at adding group and moving operation. Is this OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a new skeleton function recoverRegionStateThread at the newly attached patch.
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsAdmin2.java
Show resolved
Hide resolved
@wchevreuil Thank you for reviewing, I'll change codes follow your comments. |
…ons on wrong rs
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminServer.java
Outdated
Show resolved
Hide resolved
hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminServer.java
Show resolved
Hide resolved
*/ | ||
private void moveServerRegionsFromGroup(Set<Address> servers, String targetGroupName) | ||
private <T> void moveRegionsToOrFromGroup(Set<T> set, String targetGroupName, MoveType type) | ||
throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really a good idea to group "TO and FROM" into one method? It's an open question, just want to hear some reasoning from you. thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think added To/From to method name makes codes more easy to be understood.
And just like writing in the docs, TO means move tables regions, FROM means move server regions.
getRegionState(region).isFailedOpen()) { | ||
continue; | ||
for (Iterator<T> iter = newSet.iterator(); iter.hasNext(); ) { | ||
T el = iter.next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"el" this naming is not very good IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
/** | ||
* Move every region from servers which are currently located on these servers, | ||
* but should not be located there. | ||
* When move a table to a group, all regions of it must be moved to group servers (to the group). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should make this description clearer regarding FROM , TO, TABLE and GROUP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @xcangCRM , my opinion is that this single method resulting from a merge of previously two moveServerRegionsFromGroup and moveTableRegionsToGroup does not look easier to understand.
My earlier suggestion was just to define a common skeleton method that would be reused by both moveServerRegionsFromGroup and moveTableRegionsToGroup, given that the loop structures were basically the same, with variation only on Set types and validation logic. I believe this could be achieved by defining a common method using lambdas and generics, for example, something like:
...
private <T> void moveRegionsBetweenGroups(Set<T> regionsOwners, String targetGroupName, Function<T, List<RegionInfo>> getRegionsInfo, Function<RegionInfo,Boolean> validation, Function<T,String> getOwnerName) throws IOException
...
Then _ moveServerRegionsFromGroup_ and _ moveTableRegionsToGroup_ would reuse it, defining specific validation logic, for example moveServerRegionsFromGroup would something such as below:
moveRegionsBetweenGroups(servers, targetGroupName, rs -> getRegions(rs), info -> { return getRSGroupInfo(targetGroupName).containsTable(info.getTable());}, rs -> rs.getHostname());
Pushed a commit with an example for the skeleton mentioned previously. Let me know if you guys agree with this. |
💔 -1 overall
This message was automatically generated. |
@wchevreuil I agree with you. |
🎊 +1 overall
This message was automatically generated. |
hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminServer.java
Outdated
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@wchevreuil I have updated the patch, please help to review. |
Thanks for the updates, @sunhelly ! LGTM for the latest commits, had already resolved the conversation about exception handler when retries exhausted. Can you share your thoughts on the remaining comments? Let me know if those make sense. |
message to contains region name.
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@wchevreuil I have replied on the remaining comments, and updated my patch again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sunhelly, latest changes look good. Had made some additional review comments, please share your thoughts on those once you have a chance.
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsAdmin2.java
Show resolved
Hide resolved
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsAdmin2.java
Outdated
Show resolved
Hide resolved
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsAdmin2.java
Show resolved
Hide resolved
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsAdmin2.java
Outdated
Show resolved
Hide resolved
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsAdmin2.java
Outdated
Show resolved
Hide resolved
hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminServer.java
Outdated
Show resolved
Hide resolved
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsAdmin2.java
Outdated
Show resolved
Hide resolved
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsAdmin2.java
Outdated
Show resolved
Hide resolved
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsAdmin2.java
Show resolved
Hide resolved
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsAdmin2.java
Outdated
Show resolved
Hide resolved
@wchevreuil I have updated the codes as replied in all the comments. Please review again. |
🎊 +1 overall
This message was automatically generated. |
Pair<ServerName, RegionStateNode> gotPair = createTableAndSetARegionState(newGroup, 5); | ||
try{ | ||
rsGroupAdmin.moveTables(Sets.newHashSet(tableName), newGroup.getName()); | ||
fail("move tables to group should fail"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: let's mention the test expected a IOE that didn't happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll change the fail message to mention the expected exception.
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsAdmin2.java
Outdated
Show resolved
Hide resolved
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsAdmin2.java
Outdated
Show resolved
Hide resolved
Hi @sunhelly ! Latest changes looking great! Had just 3 more comments, all small nits. Let me know if you think those make sense, then I guess we'll be ready to push this PR. |
@wchevreuil Thanks for your comments, I have updated the patch. |
Squashed commits locally, then pushed to master. Closing this PR now, thanks for contributing @sunhelly ! |
…ons on wrong rs