Skip to content

HBASE-29380 Two concurrent remove peer requests may hang #7077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 12, 2025

Conversation

Apache9
Copy link
Contributor

@Apache9 Apache9 commented Jun 7, 2025

No description provided.

@Apache9 Apache9 self-assigned this Jun 7, 2025
@Apache9
Copy link
Contributor Author

Apache9 commented Jun 7, 2025

I've successfully reproduced the problem with a simple UT.

Let me think how to fix it properly.

@Apache-HBase

This comment has been minimized.

@Apache9
Copy link
Contributor Author

Apache9 commented Jun 7, 2025

I refactored the MasterProcedureScheduler so the cleanup work for all type of queues share the same cleanup function, so I think it is enough to just test one type in the UT.

Let's wait for the pre commit result.

@Apache9 Apache9 changed the title HBASE-29380 Add a UT to reproduce HBASE-29380 Two concurrent remove peer requests may hang Jun 7, 2025
@Apache-HBase

This comment has been minimized.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 32s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 6s master passed
+1 💚 compile 3m 22s master passed
+1 💚 checkstyle 0m 36s master passed
+1 💚 spotbugs 1m 32s master passed
+1 💚 spotless 0m 45s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 5s the patch passed
+1 💚 compile 3m 22s the patch passed
+1 💚 javac 3m 22s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 37s hbase-server: The patch generated 0 new + 0 unchanged - 1 fixed = 0 total (was 1)
+1 💚 spotbugs 1m 42s the patch passed
+1 💚 hadoopcheck 11m 58s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 45s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 9s The patch does not generate ASF License warnings.
39m 16s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7077/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7077
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 1de489362706 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 32d82b0
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 85 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7077/2/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 30s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 15s master passed
+1 💚 compile 0m 56s master passed
+1 💚 javadoc 0m 28s master passed
+1 💚 shadedjars 5m 58s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 2s the patch passed
+1 💚 compile 0m 57s the patch passed
+1 💚 javac 0m 57s the patch passed
+1 💚 javadoc 0m 27s the patch passed
+1 💚 shadedjars 6m 3s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 217m 26s hbase-server in the patch passed.
243m 33s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7077/2/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7077
Optional Tests javac javadoc unit compile shadedjars
uname Linux 4cf092f63ab1 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 32d82b0
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7077/2/testReport/
Max. process+thread count 5634 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7077/2/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache9 Apache9 requested a review from Copilot June 8, 2025 13:58
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses HBASE-29380 by adding a test that reproduces the hanging scenario for concurrent peer-removal requests and refactors the scheduler’s cleanup logic to unify and extend support to global queues and table deletion.

  • Introduce TestProcedureWaitAndWake to simulate two concurrent remove-peer procedures and ensure they complete.
  • Consolidate per-entity cleanup (peer, server, global) into a generic tryCleanupQueue helper.
  • Add a markTableAsDeleted method for safely removing table queues under test restrictions.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestProcedureWaitAndWake.java Adds a new test to simulate and verify that two concurrent remove-peer procedures do not hang.
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java Refactors cleanup methods into a generic tryCleanupQueue, adds global‐queue cleanup, and exposes markTableAsDeleted for tests.
Comments suppressed due to low confidence (3)

hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestProcedureWaitAndWake.java:166

  • The test calls waitFor but does not assert its result. If the condition times out, the test will still pass. Consider wrapping with an assertTrue on the waitFor return value to ensure failures are detected.
UTIL.waitFor(10000, () -> procExec.isFinished(id1));

hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestProcedureWaitAndWake.java:167

  • As above, add an assertion for the waitFor return value to ensure the second procedure also finishes within the timeout.
UTIL.waitFor(10000, () -> procExec.isFinished(id2));

hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java:608

  • [nitpick] The parameter getMap is a Supplier<TNode> but its name is ambiguous. Consider renaming to queueSupplier or mapSupplier for clarity.
private <T extends Comparable<T>, TNode extends Queue<T>> boolean tryCleanupQueue(T id,

@Apache9 Apache9 requested review from ndimiduk and NihalJain June 11, 2025 14:33
Copy link
Member

@ndimiduk ndimiduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find. Nice test.

@Apache9 Apache9 merged commit 7cc2f54 into apache:master Jun 12, 2025
1 check passed
Apache9 added a commit that referenced this pull request Jun 12, 2025
Signed-off-by: Nihal Jain <nihaljain@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
(cherry picked from commit 7cc2f54)
Apache9 added a commit to Apache9/hbase that referenced this pull request Jun 12, 2025
Signed-off-by: Nihal Jain <nihaljain@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
(cherry picked from commit 7cc2f54)
Apache9 added a commit that referenced this pull request Jun 12, 2025
(cherry picked from commit 7cc2f54)

Signed-off-by: Nihal Jain <nihaljain@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Apache9 added a commit that referenced this pull request Jun 12, 2025
(cherry picked from commit 7cc2f54)

Signed-off-by: Nihal Jain <nihaljain@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
(cherry picked from commit 368386d)
Apache9 added a commit that referenced this pull request Jun 12, 2025
(cherry picked from commit 7cc2f54)

Signed-off-by: Nihal Jain <nihaljain@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
(cherry picked from commit 368386d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants