Skip to content

HBASE-25835 Ignore duplicate split requests from regionserver reports #3218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 4, 2021

Conversation

apurtell
Copy link
Contributor

@apurtell apurtell commented May 2, 2021

A SplitTableRegionProcedure may already be running when a regionserver report is received that includes a split request. The outcome is multiple SplitTableRegionProcedure procedures scheduled for the split request, only one of which can succeed. The others error out.

Do not create a split procedure in response to a region state change report if the region is not open or already splitting.

@apurtell
Copy link
Contributor Author

apurtell commented May 2, 2021

Tested in an integration cluster test scenario (see #3208) but let's see what the unit test results in the CR report looks like.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 8s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 4m 21s master passed
+1 💚 compile 3m 20s master passed
+1 💚 checkstyle 1m 11s master passed
+1 💚 spotbugs 2m 12s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 0s the patch passed
+1 💚 compile 3m 23s the patch passed
+1 💚 javac 3m 23s the patch passed
+1 💚 checkstyle 1m 9s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 19m 56s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 2m 24s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
51m 28s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3218/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3218
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux 6e0fc5177d02 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 00fec24
Default Java AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count 85 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3218/1/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 36s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 52s master passed
+1 💚 compile 1m 19s master passed
+1 💚 shadedjars 9m 6s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 52s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 47s the patch passed
+1 💚 compile 1m 14s the patch passed
+1 💚 javac 1m 14s the patch passed
+1 💚 shadedjars 9m 2s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 44s the patch passed
_ Other Tests _
+1 💚 unit 155m 28s hbase-server in the patch passed.
191m 15s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3218/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3218
Optional Tests javac javadoc unit shadedjars compile
uname Linux 8181e8108e6d 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 00fec24
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3218/1/testReport/
Max. process+thread count 3819 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3218/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 2m 21s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 18s master passed
+1 💚 compile 1m 2s master passed
+1 💚 shadedjars 8m 54s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 39s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 5s the patch passed
+1 💚 compile 1m 2s the patch passed
+1 💚 javac 1m 2s the patch passed
+1 💚 shadedjars 8m 58s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 37s the patch passed
_ Other Tests _
+1 💚 unit 214m 34s hbase-server in the patch passed.
248m 23s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3218/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3218
Optional Tests javac javadoc unit shadedjars compile
uname Linux 4cdc4e13eb92 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 00fec24
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3218/1/testReport/
Max. process+thread count 3254 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3218/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@@ -1126,7 +1126,13 @@ private void updateRegionSplitTransition(final ServerName serverName, final Tran
LOG.debug("Split request from " + serverName +
", parent=" + parent + " splitKey=" + Bytes.toStringBinary(splitKey));
}
master.getMasterProcedureExecutor().submitProcedure(createSplitProcedure(parent, splitKey));
if (regionStates.getRegionState(parent).isOpened() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still on a best effort basis right? Theoretically duplicate requests are still possible because the region state in AM (from previous request) can be updated after a duplicate procedure is submitted thus passing this check, although that is less likely..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Let's add a comment here to say that this is not perfect, and is only used to reduce the concerns for operators. So later developers will know that the actual fencing is in other places.

And please get the RegionState once and store it a local variable, and then test isOpened and isSplitting on the same variable? And do we need to check whether it is null?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And do we need to check whether it is null?

Just to understand better, while creating Split Procedure, regionStates.getRegionState(parent) returning null should not happen right because we are trying to split an existing region (must be present in AM)? Or did I miss some race condition?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also not submit the request twice right? Race condition could happen everywhere. If the region has already been split and this is a delayed request then it could be null?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the request is delayed to the point where parent is not only just successfully split but also removed after no refCount existing on it, then yes you are right, this is possible. Good to cover null check also.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need !regionStates.getRegionState(parent).isSplitting() check here? When region state is OPEN it will be always true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This addresses a case I've seen in real life. It's meant to fix this corner case to make the logs and operation less messy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now github refreshes and I see all the other comments. Ok, will improve this. Just a moment...

@@ -1126,7 +1126,13 @@ private void updateRegionSplitTransition(final ServerName serverName, final Tran
LOG.debug("Split request from " + serverName +
", parent=" + parent + " splitKey=" + Bytes.toStringBinary(splitKey));
}
master.getMasterProcedureExecutor().submitProcedure(createSplitProcedure(parent, splitKey));
if (regionStates.getRegionState(parent).isOpened() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Let's add a comment here to say that this is not perfect, and is only used to reduce the concerns for operators. So later developers will know that the actual fencing is in other places.

And please get the RegionState once and store it a local variable, and then test isOpened and isSplitting on the same variable? And do we need to check whether it is null?

Processing of the RS report happens asynchronously from other activities
which can mutate region state. For example, a split procedure may already
be running. A split procedure cannot succeed if the parent region is no
longer open, so we can ignore it in that case.

Note that submitting more than one split procedure for a given region is
harmless -- the split is fenced in the procedure handling -- but it would
be noisy in the logs. Only one procedure can succeed. The other
procedure(s) would abort during initialization and report failure with
WARN level logging.
@apurtell
Copy link
Contributor Author

apurtell commented May 3, 2021

Updated after feedback.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 6s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 59s master passed
+1 💚 compile 3m 17s master passed
+1 💚 checkstyle 1m 9s master passed
+1 💚 spotbugs 2m 11s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 0s the patch passed
+1 💚 compile 3m 18s the patch passed
+1 💚 javac 3m 18s the patch passed
+1 💚 checkstyle 1m 10s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 20m 0s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 2m 19s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 11s The patch does not generate ASF License warnings.
51m 2s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3218/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #3218
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile
uname Linux e2b6cf98bc37 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 7640134
Default Java AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count 86 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3218/2/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 5s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 27s master passed
+1 💚 compile 1m 11s master passed
+1 💚 shadedjars 8m 7s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 42s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 17s the patch passed
+1 💚 compile 1m 10s the patch passed
+1 💚 javac 1m 10s the patch passed
+1 💚 shadedjars 8m 9s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 41s the patch passed
_ Other Tests _
+1 💚 unit 150m 53s hbase-server in the patch passed.
182m 48s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3218/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #3218
Optional Tests javac javadoc unit shadedjars compile
uname Linux ea2b47f7811d 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 7640134
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3218/2/testReport/
Max. process+thread count 3682 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3218/2/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 4m 28s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 24s master passed
+1 💚 compile 1m 4s master passed
+1 💚 shadedjars 8m 56s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 38s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 4s the patch passed
+1 💚 compile 1m 4s the patch passed
+1 💚 javac 1m 4s the patch passed
+1 💚 shadedjars 8m 58s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 36s the patch passed
_ Other Tests _
+1 💚 unit 219m 28s hbase-server in the patch passed.
255m 37s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3218/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #3218
Optional Tests javac javadoc unit shadedjars compile
uname Linux 6f7f6083b1f1 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 7640134
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3218/2/testReport/
Max. process+thread count 2818 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3218/2/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@apurtell
Copy link
Contributor Author

apurtell commented May 4, 2021

@Apache9 please let me know if the latest patch addresses your change requests.

Copy link
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@apurtell
Copy link
Contributor Author

apurtell commented May 4, 2021

There are approvals and I believe all feedback has been addressed. Merging. We can reopen/revert/improve if necessary.

@apurtell apurtell merged commit 432d141 into apache:master May 4, 2021
@apurtell apurtell deleted the HBASE-25835 branch May 4, 2021 17:05
asfgit pushed a commit that referenced this pull request May 4, 2021
…#3218)

Processing of the RS report happens asynchronously from other activities
which can mutate region state. For example, a split procedure may already
be running. A split procedure cannot succeed if the parent region is no
longer open, so we can ignore it in that case.

Note that submitting more than one split procedure for a given region is
harmless -- the split is fenced in the procedure handling -- but it would
be noisy in the logs. Only one procedure can succeed. The other
procedure(s) would abort during initialization and report failure with
WARN level logging.

Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Pankaj <pankajkumar@apache.org>
asfgit pushed a commit that referenced this pull request May 4, 2021
…#3218)

Processing of the RS report happens asynchronously from other activities
which can mutate region state. For example, a split procedure may already
be running. A split procedure cannot succeed if the parent region is no
longer open, so we can ignore it in that case.

Note that submitting more than one split procedure for a given region is
harmless -- the split is fenced in the procedure handling -- but it would
be noisy in the logs. Only one procedure can succeed. The other
procedure(s) would abort during initialization and report failure with
WARN level logging.

Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Pankaj <pankajkumar@apache.org>
@Apache9
Copy link
Contributor

Apache9 commented May 5, 2021

A bit late, it is still holiday in China, not always online outside...

+1.

Thanks @apurtell , the comment is great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants