-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-25835 Ignore duplicate split requests from regionserver reports #3218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tested in an integration cluster test scenario (see #3208) but let's see what the unit test results in the CR report looks like. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@@ -1126,7 +1126,13 @@ private void updateRegionSplitTransition(final ServerName serverName, final Tran | |||
LOG.debug("Split request from " + serverName + | |||
", parent=" + parent + " splitKey=" + Bytes.toStringBinary(splitKey)); | |||
} | |||
master.getMasterProcedureExecutor().submitProcedure(createSplitProcedure(parent, splitKey)); | |||
if (regionStates.getRegionState(parent).isOpened() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still on a best effort basis right? Theoretically duplicate requests are still possible because the region state in AM (from previous request) can be updated after a duplicate procedure is submitted thus passing this check, although that is less likely..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Let's add a comment here to say that this is not perfect, and is only used to reduce the concerns for operators. So later developers will know that the actual fencing is in other places.
And please get the RegionState once and store it a local variable, and then test isOpened and isSplitting on the same variable? And do we need to check whether it is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And do we need to check whether it is null?
Just to understand better, while creating Split Procedure, regionStates.getRegionState(parent)
returning null
should not happen right because we are trying to split an existing region (must be present in AM)? Or did I miss some race condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also not submit the request twice right? Race condition could happen everywhere. If the region has already been split and this is a delayed request then it could be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the request is delayed to the point where parent is not only just successfully split but also removed after no refCount existing on it, then yes you are right, this is possible. Good to cover null check also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need !regionStates.getRegionState(parent).isSplitting()
check here? When region state is OPEN it will be always true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This addresses a case I've seen in real life. It's meant to fix this corner case to make the logs and operation less messy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now github refreshes and I see all the other comments. Ok, will improve this. Just a moment...
@@ -1126,7 +1126,13 @@ private void updateRegionSplitTransition(final ServerName serverName, final Tran | |||
LOG.debug("Split request from " + serverName + | |||
", parent=" + parent + " splitKey=" + Bytes.toStringBinary(splitKey)); | |||
} | |||
master.getMasterProcedureExecutor().submitProcedure(createSplitProcedure(parent, splitKey)); | |||
if (regionStates.getRegionState(parent).isOpened() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Let's add a comment here to say that this is not perfect, and is only used to reduce the concerns for operators. So later developers will know that the actual fencing is in other places.
And please get the RegionState once and store it a local variable, and then test isOpened and isSplitting on the same variable? And do we need to check whether it is null?
Processing of the RS report happens asynchronously from other activities which can mutate region state. For example, a split procedure may already be running. A split procedure cannot succeed if the parent region is no longer open, so we can ignore it in that case. Note that submitting more than one split procedure for a given region is harmless -- the split is fenced in the procedure handling -- but it would be noisy in the logs. Only one procedure can succeed. The other procedure(s) would abort during initialization and report failure with WARN level logging.
Updated after feedback. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@Apache9 please let me know if the latest patch addresses your change requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There are approvals and I believe all feedback has been addressed. Merging. We can reopen/revert/improve if necessary. |
…#3218) Processing of the RS report happens asynchronously from other activities which can mutate region state. For example, a split procedure may already be running. A split procedure cannot succeed if the parent region is no longer open, so we can ignore it in that case. Note that submitting more than one split procedure for a given region is harmless -- the split is fenced in the procedure handling -- but it would be noisy in the logs. Only one procedure can succeed. The other procedure(s) would abort during initialization and report failure with WARN level logging. Signed-off-by: Bharath Vissapragada <bharathv@apache.org> Signed-off-by: Viraj Jasani <vjasani@apache.org> Signed-off-by: Pankaj <pankajkumar@apache.org>
…#3218) Processing of the RS report happens asynchronously from other activities which can mutate region state. For example, a split procedure may already be running. A split procedure cannot succeed if the parent region is no longer open, so we can ignore it in that case. Note that submitting more than one split procedure for a given region is harmless -- the split is fenced in the procedure handling -- but it would be noisy in the logs. Only one procedure can succeed. The other procedure(s) would abort during initialization and report failure with WARN level logging. Signed-off-by: Bharath Vissapragada <bharathv@apache.org> Signed-off-by: Viraj Jasani <vjasani@apache.org> Signed-off-by: Pankaj <pankajkumar@apache.org>
A bit late, it is still holiday in China, not always online outside... +1. Thanks @apurtell , the comment is great. |
A SplitTableRegionProcedure may already be running when a regionserver report is received that includes a split request. The outcome is multiple SplitTableRegionProcedure procedures scheduled for the split request, only one of which can succeed. The others error out.
Do not create a split procedure in response to a region state change report if the region is not open or already splitting.