Skip to content

HBASE-27563 ChaosMonkey sometimes generates invalid boundaries for random item selection #4954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ndimiduk
Copy link
Member

Clamp the boundaries of the selected sublist according to the input list size.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.


int startIndex = ThreadLocalRandom.current().nextInt(items.length - selectedNumber);
return originalItems.subList(startIndex, startIndex + selectedNumber);
final int startIndex =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the problem here is ratio could be greater than 1.0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems so... and it looks like this patch is not sufficient. Here's the stack trace from branch-2.5.

2023-01-10T19:33:35,651 WARN  [ChaosMonkey-0] policies.Policy: Exception occurred during performing action: java.lang.IllegalArgumentException: bound must be positive
        at java.base/java.util.Random.nextInt(Random.java:322)
        at java.base/java.util.concurrent.ThreadLocalRandom.nextInt(ThreadLocalRandom.java:449)
        at org.apache.hadoop.hbase.chaos.monkies.PolicyBasedChaosMonkey.selectRandomItems(PolicyBasedChaosMonkey.java:123)
        at org.apache.hadoop.hbase.chaos.actions.RollingBatchRestartRsAction.selectServers(RollingBatchRestartRsAction.java:130)
        at org.apache.hadoop.hbase.chaos.actions.RollingBatchRestartRsAction.perform(RollingBatchRestartRsAction.java:75)
        at org.apache.hadoop.hbase.chaos.policies.DoActionsOncePolicy.runOneIteration(DoActionsOncePolicy.java:48)
        at org.apache.hadoop.hbase.chaos.policies.PeriodicPolicy.run(PeriodicPolicy.java:41)
        at org.apache.hadoop.hbase.chaos.policies.CompositeSequentialPolicy.run(CompositeSequentialPolicy.java:42)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more likely that I have ratio == length, we'll have this problem... Let me add some debugging and see what I can learn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd better add some comments here to say why we need these guards? Is it because of the floating point precision?

@Apache9
Copy link
Contributor

Apache9 commented Jan 11, 2023

Checked the code, the ratio is percentage of regionservers we want to restart, normally, so in general it should be a value between 0 and 1. But there is no sanity check in our code...

@ndimiduk
Copy link
Member Author

2023-01-11T15:39:46,370 DEBUG [ChaosMonkey-0] monkies.PolicyBasedChaosMonkey: selectRandomItems(10, 1.0) of type class [Lorg.apache.hadoop.hbase.ServerName;

@ndimiduk
Copy link
Member Author

We have 1.0f as default ratio values for a couple constants in MonkeyConstants.java.

@ndimiduk
Copy link
Member Author

This code handles the ratio=1.0 case correctly.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.


List<T> originalItems = Arrays.asList(items);
final int selectedNumber = (int) Math.ceil(items.length * ratio);
final List<T> originalItems = Arrays.asList(items);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrays.asList will create a ArrayList which is backed by the given array, so the later shuffle will change the order of the items array which is passed in as a parameter. If we think this is acceptable, we can just shuffle on the given array instead of wrapping it? If not, I think we should copy the array.


int startIndex = ThreadLocalRandom.current().nextInt(items.length - selectedNumber);
return originalItems.subList(startIndex, startIndex + selectedNumber);
if (selectedNumber == items.length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use >= for safety?


int startIndex = ThreadLocalRandom.current().nextInt(items.length - selectedNumber);
return originalItems.subList(startIndex, startIndex + selectedNumber);
final int startIndex =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd better add some comments here to say why we need these guards? Is it because of the floating point precision?

@ndimiduk
Copy link
Member Author

Good points all around, thanks @Apache9

@@ -151,7 +157,9 @@ public boolean isStopped() {

@Override
public void waitForStop() throws InterruptedException {
monkeyThreadPool.awaitTermination(1, TimeUnit.MINUTES);
if (!monkeyThreadPool.awaitTermination(1, TimeUnit.MINUTES)) {
LOG.warn("Some pool threads failed to terminate, {}", monkeyThreadPool);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange. monkeyThreadPool is not a daemon pool, so if this termination fails for some reason, the process can hang. Let me add a forced shutdown.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

final int startIndex =
Math.max(0, ThreadLocalRandom.current().nextInt(items.length - selectedNumber));
final int endIndex = Math.min(items.length, startIndex + selectedNumber);
return shuffledItems.subList(startIndex, endIndex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe here you could make use of org.apache.hadoop.hbase.util.ReservoirSample so we do not need to copy the whole array? Not a blocker issue anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's simpler.

@ndimiduk ndimiduk force-pushed the 27563-chaosmonkey-invalid-boundaries branch from 5ea901a to 00ca430 Compare January 12, 2023 14:44
…ndom item selection

Signed-off-by: Duo Zhang <zhangduo@apache.org>
@ndimiduk ndimiduk force-pushed the 27563-chaosmonkey-invalid-boundaries branch from 00ca430 to b02e312 Compare January 12, 2023 14:46
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 38s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 1m 58s master passed
+1 💚 compile 0m 14s master passed
+1 💚 shadedjars 3m 58s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 9s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 3s the patch passed
+1 💚 compile 0m 16s the patch passed
+1 💚 javac 0m 16s the patch passed
+1 💚 shadedjars 4m 1s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 9s the patch passed
_ Other Tests _
+1 💚 unit 0m 35s hbase-it in the patch passed.
15m 11s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4954/4/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #4954
Optional Tests javac javadoc unit shadedjars compile
uname Linux 2c34d0b3d4f3 5.4.0-1088-aws #96~18.04.1-Ubuntu SMP Mon Oct 17 02:57:48 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / dff8e50
Default Java Temurin-1.8.0_352-b08
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4954/4/testReport/
Max. process+thread count 561 (vs. ulimit of 30000)
modules C: hbase-it U: hbase-it
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4954/4/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 39s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 3s master passed
+1 💚 compile 0m 18s master passed
+1 💚 shadedjars 4m 15s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 11s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 37s the patch passed
+1 💚 compile 0m 18s the patch passed
+1 💚 javac 0m 18s the patch passed
+1 💚 shadedjars 4m 20s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 11s the patch passed
_ Other Tests _
+1 💚 unit 0m 36s hbase-it in the patch passed.
17m 32s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4954/4/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #4954
Optional Tests javac javadoc unit shadedjars compile
uname Linux 23bfc1180327 5.4.0-1092-aws #100~18.04.2-Ubuntu SMP Tue Nov 29 08:39:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / dff8e50
Default Java Eclipse Adoptium-11.0.17+8
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4954/4/testReport/
Max. process+thread count 584 (vs. ulimit of 30000)
modules C: hbase-it U: hbase-it
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4954/4/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 8s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 2m 32s master passed
+1 💚 compile 0m 27s master passed
+1 💚 checkstyle 0m 10s master passed
+1 💚 spotless 0m 40s branch has no errors when running spotless:check.
+1 💚 spotbugs 0m 22s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 32s the patch passed
+1 💚 compile 0m 26s the patch passed
+1 💚 javac 0m 26s the patch passed
+1 💚 checkstyle 0m 11s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 8m 44s Patch does not cause any errors with Hadoop 3.2.4 3.3.4.
+1 💚 spotless 0m 40s patch has no errors when running spotless:check.
+1 💚 spotbugs 0m 30s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 10s The patch does not generate ASF License warnings.
24m 23s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4954/4/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #4954
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 1bbcf7845206 5.4.0-131-generic #147-Ubuntu SMP Fri Oct 14 17:07:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / dff8e50
Default Java Eclipse Adoptium-11.0.17+8
Max. process+thread count 79 (vs. ulimit of 30000)
modules C: hbase-it U: hbase-it
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4954/4/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 39s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 35s master passed
+1 💚 compile 0m 17s master passed
+1 💚 shadedjars 4m 11s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 10s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 33s the patch passed
+1 💚 compile 0m 17s the patch passed
+1 💚 javac 0m 17s the patch passed
+1 💚 shadedjars 4m 12s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 11s the patch passed
_ Other Tests _
+1 💚 unit 0m 36s hbase-it in the patch passed.
16m 52s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4954/5/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #4954
Optional Tests javac javadoc unit shadedjars compile
uname Linux 17893de185d8 5.4.0-1092-aws #100~18.04.2-Ubuntu SMP Tue Nov 29 08:39:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / dff8e50
Default Java Eclipse Adoptium-11.0.17+8
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4954/5/testReport/
Max. process+thread count 594 (vs. ulimit of 30000)
modules C: hbase-it U: hbase-it
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4954/5/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 38s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 2s master passed
+1 💚 compile 0m 15s master passed
+1 💚 shadedjars 10m 12s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 10s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 45s the patch passed
+1 💚 compile 0m 15s the patch passed
+1 💚 javac 0m 15s the patch passed
+1 💚 shadedjars 4m 11s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 11s the patch passed
_ Other Tests _
+1 💚 unit 0m 43s hbase-it in the patch passed.
22m 29s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4954/5/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #4954
Optional Tests javac javadoc unit shadedjars compile
uname Linux 9f8ea032e44e 5.4.0-1088-aws #96~18.04.1-Ubuntu SMP Mon Oct 17 02:57:48 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / dff8e50
Default Java Temurin-1.8.0_352-b08
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4954/5/testReport/
Max. process+thread count 556 (vs. ulimit of 30000)
modules C: hbase-it U: hbase-it
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4954/5/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 21s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 2s master passed
+1 💚 compile 0m 31s master passed
+1 💚 checkstyle 0m 12s master passed
+1 💚 spotless 0m 48s branch has no errors when running spotless:check.
+1 💚 spotbugs 0m 28s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 11s the patch passed
+1 💚 compile 0m 38s the patch passed
+1 💚 javac 0m 38s the patch passed
+1 💚 checkstyle 0m 12s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 12m 8s Patch does not cause any errors with Hadoop 3.2.4 3.3.4.
+1 💚 spotless 0m 55s patch has no errors when running spotless:check.
+1 💚 spotbugs 0m 38s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 9s The patch does not generate ASF License warnings.
32m 8s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4954/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #4954
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux b65c4417fd20 5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / dff8e50
Default Java Eclipse Adoptium-11.0.17+8
Max. process+thread count 84 (vs. ulimit of 30000)
modules C: hbase-it U: hbase-it
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4954/5/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@ndimiduk ndimiduk merged commit 2a7c69d into apache:master Jan 12, 2023
@ndimiduk ndimiduk deleted the 27563-chaosmonkey-invalid-boundaries branch January 12, 2023 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants