Skip to content

HBASE-28206 [JDK17] JVM crashes intermittently on aarch64 #5561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 6, 2023

Conversation

bbeaudreault
Copy link
Contributor

@bbeaudreault bbeaudreault commented Dec 5, 2023

We routinely see JVM crashes originating from our RegionServerMetricsWrapperRunnable in jdk17 on aarch64. Since that method is so large, it's hard to know exactly where or what the problem is. There is an issue submitted, but no indication of whether or when a fix will come.

The core dump indicates its a problem with On Stack Replacement (OSR). Reading up on that, it can kick in when long methods are suddenly deemed hot and attempt to be optimized in place. Taking a stab in the dark based on that, I saw two potential issues in RegionServerMetricsWrapperRunnable:

  1. The method is exceptionally long, especially when considering the loops
  2. There are over 50 volatile variables updated in the parent class

This PR attempts to solve both issues:

  1. Most of the volatile variables are replaced by a single volatile RegionMetricAggregate.
  2. I tried to break down the giant method into organized units:
    a. regionserver-level calculations are left in the run() method
    b. the main aggregate() method of RegionMetricAggregate loops and collects region-level metrics
    c. for each region, an aggregateStores() method is called which loops and collects store-level metrics.

This may not be perfect, but I think it's an improvement for three reasons:

  1. I've been running it internally for a couple weeks and have not seen a crash (previously multiple per day)
  2. It attempts to organize the code a little bit so its easier to manage, and is a net-reduction in code.
  3. If the crash were to resurface, since there are more methods now we'd be able to better pinpoint the problematic frame.

In order to ensure this refactor did not break any metrics, I wrote an exhaustive unit test. It validates each of the aggregated getters, and succeeds against both the old and new implementation.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 2m 40s master passed
+1 💚 compile 2m 27s master passed
+1 💚 checkstyle 0m 34s master passed
+1 💚 spotless 0m 40s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 27s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 43s the patch passed
+1 💚 compile 2m 24s the patch passed
+1 💚 javac 2m 24s the patch passed
-0 ⚠️ checkstyle 0m 35s hbase-server: The patch generated 3 new + 0 unchanged - 1 fixed = 3 total (was 1)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 9m 15s Patch does not cause any errors with Hadoop 3.2.4 3.3.6.
+1 💚 spotless 0m 42s patch has no errors when running spotless:check.
+1 💚 spotbugs 1m 37s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 10s The patch does not generate ASF License warnings.
31m 46s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #5561
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux ab603117c156 5.4.0-163-generic #180-Ubuntu SMP Tue Sep 5 13:21:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 25e9228
Default Java Eclipse Adoptium-11.0.17+8
checkstyle https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/2/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
Max. process+thread count 78 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/2/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 12s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 37s master passed
+1 💚 compile 0m 47s master passed
+1 💚 shadedjars 4m 50s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 25s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 42s the patch passed
+1 💚 compile 0m 48s the patch passed
+1 💚 javac 0m 48s the patch passed
+1 💚 shadedjars 4m 49s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 25s the patch passed
_ Other Tests _
+1 💚 unit 220m 48s hbase-server in the patch passed.
243m 4s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #5561
Optional Tests javac javadoc unit shadedjars compile
uname Linux 44332dbc22ab 5.4.0-166-generic #183-Ubuntu SMP Mon Oct 2 11:28:33 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 25e9228
Default Java Eclipse Adoptium-11.0.17+8
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/2/testReport/
Max. process+thread count 4819 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 32s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 19s master passed
+1 💚 compile 0m 42s master passed
+1 💚 shadedjars 4m 48s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 25s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 27s the patch passed
+1 💚 compile 0m 43s the patch passed
+1 💚 javac 0m 43s the patch passed
+1 💚 shadedjars 4m 48s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 24s the patch passed
_ Other Tests _
-1 ❌ unit 237m 0s hbase-server in the patch failed.
258m 36s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #5561
Optional Tests javac javadoc unit shadedjars compile
uname Linux f79662dda915 5.4.0-166-generic #183-Ubuntu SMP Mon Oct 2 11:28:33 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 25e9228
Default Java Temurin-1.8.0_352-b08
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/2/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/2/testReport/
Max. process+thread count 4889 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 38s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 2m 51s master passed
+1 💚 compile 2m 29s master passed
+1 💚 checkstyle 0m 37s master passed
+1 💚 spotless 0m 44s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 33s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 40s the patch passed
+1 💚 compile 2m 27s the patch passed
+1 💚 javac 2m 27s the patch passed
+1 💚 checkstyle 0m 36s hbase-server: The patch generated 0 new + 0 unchanged - 1 fixed = 0 total (was 1)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 10m 20s Patch does not cause any errors with Hadoop 3.2.4 3.3.6.
+1 💚 spotless 0m 41s patch has no errors when running spotless:check.
+1 💚 spotbugs 1m 38s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 11s The patch does not generate ASF License warnings.
33m 9s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #5561
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 539b05235ba0 5.4.0-163-generic #180-Ubuntu SMP Tue Sep 5 13:21:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 25e9228
Default Java Eclipse Adoptium-11.0.17+8
Max. process+thread count 79 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/3/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@bbeaudreault bbeaudreault merged commit 6e421e9 into apache:master Dec 6, 2023
@bbeaudreault bbeaudreault deleted the HBASE-28206 branch December 6, 2023 17:53
bbeaudreault added a commit that referenced this pull request Dec 6, 2023
Signed-off-by: Duo Zhang <zhangduo@apache.org>
bbeaudreault added a commit that referenced this pull request Dec 6, 2023
Signed-off-by: Duo Zhang <zhangduo@apache.org>
bbeaudreault added a commit that referenced this pull request Dec 6, 2023
Signed-off-by: Duo Zhang <zhangduo@apache.org>
bbeaudreault added a commit that referenced this pull request Dec 6, 2023
Signed-off-by: Duo Zhang <zhangduo@apache.org>
bbeaudreault added a commit to HubSpot/hbase that referenced this pull request Dec 6, 2023
…arch64 (apache#5561)

Signed-off-by: Duo Zhang <zhangduo@apache.org>
@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 25s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 59s master passed
+1 💚 compile 0m 43s master passed
+1 💚 shadedjars 5m 10s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 25s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 44s the patch passed
+1 💚 compile 0m 43s the patch passed
+1 💚 javac 0m 43s the patch passed
+1 💚 shadedjars 5m 8s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 22s the patch passed
_ Other Tests _
-1 ❌ unit 215m 17s hbase-server in the patch failed.
237m 40s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #5561
Optional Tests javac javadoc unit shadedjars compile
uname Linux a676a46a61ed 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 25e9228
Default Java Eclipse Adoptium-11.0.17+8
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/3/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/3/testReport/
Max. process+thread count 4734 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/3/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 32s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 39s master passed
+1 💚 compile 0m 42s master passed
+1 💚 shadedjars 4m 50s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 26s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 28s the patch passed
+1 💚 compile 0m 41s the patch passed
+1 💚 javac 0m 41s the patch passed
+1 💚 shadedjars 4m 48s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 24s the patch passed
_ Other Tests _
-1 ❌ unit 236m 19s hbase-server in the patch failed.
257m 54s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/3/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #5561
Optional Tests javac javadoc unit shadedjars compile
uname Linux 074794bd56f4 5.4.0-166-generic #183-Ubuntu SMP Mon Oct 2 11:28:33 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 25e9228
Default Java Temurin-1.8.0_352-b08
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/3/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/3/testReport/
Max. process+thread count 4892 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5561/3/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

kadirozde pushed a commit to kadirozde/hbase that referenced this pull request Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants