Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MauveMultiThrdLoad_5m crash vmstate=0x00040000 #19544

Closed
pshipton opened this issue May 23, 2024 · 10 comments
Closed

MauveMultiThrdLoad_5m crash vmstate=0x00040000 #19544

pshipton opened this issue May 23, 2024 · 10 comments
Labels
jdk23 segfault Issues that describe segfaults / JVM crashes test failure

Comments

@pshipton
Copy link
Member

https://openj9-jenkins.osuosl.org/job/Test_openjdknext_j9_sanity.system_aarch64_linux_Nightly_testList_1/17 - ub20-aarch64-osu-6
MauveMultiThrdLoad_5m_1 -XX:-UseCompressedOops

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdknext_j9_sanity.system_aarch64_linux_Nightly_testList_1/17/system_test_output.tar.gz

21:59:55  STF 01:59:54.597 - Found dump at: /home/jenkins/workspace/Test_openjdknext_j9_sanity.system_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/../TKG/output_17164292715531/MauveMultiThrdLoad_5m_1/20240523-015946-MauveMultiThrdLoad/generation/javacore.20240523.015950.3755623.0002.txt
21:59:55  STF 01:59:54.597 - Found dump at: /home/jenkins/workspace/Test_openjdknext_j9_sanity.system_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/../TKG/output_17164292715531/MauveMultiThrdLoad_5m_1/20240523-015946-MauveMultiThrdLoad/generation/core.20240523.015950.3755623.0001.dmp
21:59:55  GEN stderr javacore file generated - /home/jenkins/workspace/Test_openjdknext_j9_sanity.system_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/../TKG/output_17164292715531/MauveMultiThrdLoad_5m_1/20240523-015946-MauveMultiThrdLoad/generation/javacore.20240523.015950.3755623.0002.txt
21:59:55  GEN stderr core file generated - /home/jenkins/workspace/Test_openjdknext_j9_sanity.system_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/../TKG/output_17164292715531/MauveMultiThrdLoad_5m_1/20240523-015946-MauveMultiThrdLoad/generation/core.20240523.015950.3755623.0001.dmp
1TISIGINFO     Dump Event "gpf" (00002000) received

1XHEXCPCODE    J9Generic_Signal_Number: 00000018
1XHEXCPCODE    Signal_Number: 0000000B
1XHEXCPCODE    Error_Value: 00000000
1XHEXCPCODE    Signal_Code: 00000001
1XHEXCPCODE    Handler1: 0000FFFFA6290B64
1XHEXCPCODE    Handler2: 0000FFFFA61F6BA0
1XHEXCPCODE    InaccessibleAddress: 00000000000004B0
NULL
1XHEXCPMODULE  Module: /lib/aarch64-linux-gnu/libpthread.so.0
1XHEXCPMODULE  Module_base_address: 0000FFFFA69BB000
1XHEXCPMODULE  Symbol: pthread_mutex_lock
1XHEXCPMODULE  Symbol_address: 0000FFFFA69C4BF0
NULL
1XHREGISTERS   Registers:
2XHREGISTER      R0: 00000000000004A0
2XHREGISTER      R1: 0000000000394E7C
2XHREGISTER      R2: 0000FFFFA61CE940
2XHREGISTER      R3: 0000FFFFA69E72F0
2XHREGISTER      R4: 0000000000000001
2XHREGISTER      R5: 00000000FFF605D8
2XHREGISTER      R6: 0000FFFFA00A0678
2XHREGISTER      R7: E8FCA7977295B400
2XHREGISTER      R8: 0000000000107C20
2XHREGISTER      R9: 0000FFFFA49D20E4
2XHREGISTER      R10: 0000000000000000
2XHREGISTER      R11: 0000000000000008
2XHREGISTER      R12: 0000000000000010
2XHREGISTER      R13: 0000000000000002
2XHREGISTER      R14: 0000FFFF7DE99C40
2XHREGISTER      R15: 0000000000000004
2XHREGISTER      R16: 0000FFFFA61CE058
2XHREGISTER      R17: 0000FFFFA69C4BF0
2XHREGISTER      R18: 0000000000000000
2XHREGISTER      R19: 0000000000000000
2XHREGISTER      R20: 0000FFFFA02503B0
2XHREGISTER      R21: 00000000000004B0
2XHREGISTER      R22: 0000FFFFA61CD000
2XHREGISTER      R23: 0000000000000001
2XHREGISTER      R24: 0000000000000000
2XHREGISTER      R25: 0000FFFF7DE99860
2XHREGISTER      R26: 0000FFFF7DE99B88
2XHREGISTER      R27: 0000FFFF7DE99C20
2XHREGISTER      R28: 0000000000000010
2XHREGISTER      R29: 0000FFFF7DE99880
2XHREGISTER      R30: 0000FFFFA61A85C0
2XHREGISTER      R31: 0000FFFF7DE99880
2XHREGISTER      V0: 3D2743162C00CC1F
2XHREGISTER      V1: 9B5A405CB75A8C43
2XHREGISTER      V2: 9B5ABFA311278F09
2XHREGISTER      V3: 402F1FBDA155C719
2XHREGISTER      V4: BFD00EA348B88334
2XHREGISTER      V5: 3FD5575B0BE00B6A
2XHREGISTER      V6: 3FE62E42FEFA39EF
2XHREGISTER      V7: 0000FFFFA5026BD4
2XHREGISTER      V8: 0000000000000000
2XHREGISTER      V9: 0000000000000000
2XHREGISTER      V10: 0000000000000000
2XHREGISTER      V11: 0000000000000000
2XHREGISTER      V12: 0000000000000000
2XHREGISTER      V13: 0000000000000000
2XHREGISTER      V14: 0000000000000000
2XHREGISTER      V15: 0000000000000000
2XHREGISTER      V16: 0000000040400000
2XHREGISTER      V17: AAA8AA00A8000000
2XHREGISTER      V18: 8020080200000000
2XHREGISTER      V19: 3F9EB851EB851EB8
2XHREGISTER      V20: 3FB1EB851EB851EC
2XHREGISTER      V21: 0000000000000002
2XHREGISTER      V22: 0000000000000000
2XHREGISTER      V23: 0000000000000008
2XHREGISTER      V24: 3FD6666666666666
2XHREGISTER      V25: 0000000000000000
2XHREGISTER      V26: 0000000000000001
2XHREGISTER      V27: 000000000000000A
2XHREGISTER      V28: 0000000000000001
2XHREGISTER      V29: 0000000000000000
2XHREGISTER      V30: 6E776F6474756873
2XHREGISTER      V31: 383031203D20734E
2XHREGISTER      PC: 0000FFFFA69C4C00
2XHREGISTER      SP: 0000FFFF7DE99880
2XHREGISTER      PSTATE: 0000000060000000
NULL
1XHFLAGS       VM flags:0000000000040000
NULL

1XMCURTHDINFO  Current thread
3XMTHREADINFO      "Attach API teardown" J9VMThread:0x0000000000107B00, omrthread_t:0x0000FFFFA02503B0, java/lang/Thread:0x00000000FFF8A7E0, state:R, prio=5
3XMJAVALTHREAD            (java/lang/Thread getId:0x13, isDaemon:false)
3XMJAVALTHRCCL            jdk/internal/loader/ClassLoaders$AppClassLoader(0x00000000FFF18598)
3XMTHREADINFO1            (native thread ID:0x394E7C, native priority:0x5, native policy:UNKNOWN, vmstate:R, vm thread flags:0x00000020)
3XMTHREADINFO2            (native stack address range from:0x0000FFFF7DE5C000, to:0x0000FFFF7DE9C000, size:0x40000)
3XMCPUTIME               CPU usage total: 0.179275100 secs, current category="System-JVM"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=4792 (0x12B8)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/lang/Thread.interruptImpl(Native Method)
4XESTACKTRACE                at java/lang/Thread.interrupt0(Thread.java:3016)
4XESTACKTRACE                at java/lang/Thread.interrupt(Thread.java:1728)
4XESTACKTRACE                at openj9/internal/tools/attach/target/AttachHandler.terminate(AttachHandler.java:474)
4XESTACKTRACE                at openj9/internal/tools/attach/target/AttachHandler$teardownHook.run(AttachHandler.java:668)
3XMTHREADINFO3           Native callstack:
4XENATIVESTACK               protectedBacktrace+0x14 (0x0000FFFFA61F3384 [libj9prt29.so+0x24384])
4XENATIVESTACK               omrsig_protect+0x21c (0x0000FFFFA61F782C [libj9prt29.so+0x2882c])
4XENATIVESTACK               0x0000FFFFA61F3854
4XENATIVESTACK               0x0000FFFFA61F782C
@pshipton pshipton added test failure segfault Issues that describe segfaults / JVM crashes labels May 23, 2024
@pshipton
Copy link
Member Author

@knn-k fyi

@knn-k
Copy link
Contributor

knn-k commented May 24, 2024

There are some other failures in pthread_mutex_lock recently, such as #19366 and #19484. Possibly related.

@knn-k
Copy link
Contributor

knn-k commented May 24, 2024

I ran a 10x Grinder job for MauveMultiThrdLoad_5m_1, and it didn't fail.
https://openj9-jenkins.osuosl.org/job/Grinder/3623/

gdb output for the core file:

   0xffffa69c4bf0 <__GI___pthread_mutex_lock>:  stp     x29, x30, [sp, #-48]!
   0xffffa69c4bf4 <__GI___pthread_mutex_lock+4>:        mov     x29, sp
   0xffffa69c4bf8 <__GI___pthread_mutex_lock+8>:        stp     x21, x22, [sp, #32]
   0xffffa69c4bfc <__GI___pthread_mutex_lock+12>:       add     x21, x0, #0x10
   0xffffa69c4c00 <__GI___pthread_mutex_lock+16>:       ldr     w2, [x21] <-- PC points here: x21= 0x4B0 = x0 + 0x10
   0xffffa69c4c04 <__GI___pthread_mutex_lock+20>:       mov     w1, #0x17f                      // #383
   0xffffa69c4c08 <__GI___pthread_mutex_lock+24>:       and     w1, w2, w1
   0xffffa69c4c0c <__GI___pthread_mutex_lock+28>:       nop
   0xffffa69c4c10 <__GI___pthread_mutex_lock+32>:       tst     w2, #0x7c
   0xffffa69c4c14 <__GI___pthread_mutex_lock+36>:       b.ne        0xffffa69c4c70 <__GI___pthread_mutex_lock+128>  // b.any

The value 0x4A0 in x0 seems to come from the caller side, omrthread_interrupt().

   0xffffa61a8598:      ldr     x20, [x20, #3960]
   0xffffa61a859c:      add     x21, x19, #0x4a0 <- This is passed to pthread_mutex_lock as the argument
   0xffffa61a85a0:      ldr     w0, [x20, #112]
   0xffffa61a85a4:      bl      0xffffa61a2280 <- call pthread_getspecific
   0xffffa61a85a8:      mov     x20, x0
   0xffffa61a85ac:      ldr     x0, [x0]
   0xffffa61a85b0:      add     x0, x0, #0x78
   0xffffa61a85b4:      bl      0xffffa61a2100 <- call pthread_mutex_lock
   0xffffa61a85b8:      mov     x0, x21
   0xffffa61a85bc:      bl      0xffffa61a2100 <- call pthread_mutex_lock
   0xffffa61a85c0:      ldr     x0, [x19, #1056] <- LR points here
   0xffffa61a85c4:      tbnz    w0, #2, 0xffffa61a85f8

@knn-k
Copy link
Contributor

knn-k commented May 27, 2024

I ran MauveMultiThrdLoad_5m_1 60 times in total, but I got no failures.

It is likely that VM_VMHelpers::threadInterruptImpl() passed a NULL to omrthread_interrupt(). That means targetThread->osThread was a NULL there.

omrthread_interrupt(targetThread->osThread);

Then threadInterrupt() in omrthread.c called THREAD_LOCK() with the NULL pointer. The offset of mutex in J9Thread is 0x4a0 in this build, and that is the value passed to pthread_mutex_lock.

@babsingh
Copy link
Contributor

Looked at the core file in DDR, osThread is not NULL.

> !gpinfo
Failing Thread: !j9vmthread 0x107b00
Failing Thread ID: 0x394e7c (3755644)
gpInfo:
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=0000FFFFA6290B64 Handler2=0000FFFFA61F6BA0 InaccessibleAddress=00000000000004B0
...
Module=/lib/aarch64-linux-gnu/libpthread.so.0
Module_base_address=0000FFFFA69BB000 Symbol=pthread_mutex_lock
Symbol_address=0000FFFFA69C4BF0

> !stack 0x107b00
<107b00> 	!j9method 0x000000000005FCB8   java/lang/Thread.interruptImpl()V
<107b00> 	!j9method 0x000000000005FC18   java/lang/Thread.interrupt0()V
<107b00> 	!j9method 0x000000000005F498   java/lang/Thread.interrupt()V
<107b00> 	!j9method 0x000000000014D6E0   openj9/internal/tools/attach/target/AttachHandler.terminate(Z)Z
<107b00> 	!j9method 0x000000000019E398   openj9/internal/tools/attach/target/AttachHandler$teardownHook.run()V
<107b00> 	                        JNI call-in frame
<107b00> 	                        Native method frame

> !j9vmthread 0x107b00 | grep osThread
	0x140: struct J9Thread* osThread = !j9thread 0x0000FFFFA02503B0

@babsingh
Copy link
Contributor

babsingh commented May 27, 2024

All omrthread_interrupt crashes are associated to AttachHandler$teardownHook. They share the same Java stack which is mentioned in the above comment. This failure was also observed in the past; see #16659 (comment) and #16659 (comment). It might be associated to adopting RI's j.l.Thread in JDK19+. It has become more prevalent in JDK23.

I am unable to reproduce this failure locally. It requires more investigation, but we might need to update AttachHandler's teardown hook so that it works properly with RI's j.l.Thread.

fyi @tajila

@knn-k
Copy link
Contributor

knn-k commented May 28, 2024

I ran MauveMultiThrdLoad_5m_1 30 more times. No failures.

@babsingh
Copy link
Contributor

babsingh commented Jun 10, 2024

babsingh added a commit to babsingh/openj9-openjdk-jdk that referenced this issue Jun 11, 2024
interruptImpl and isInterruptedImpl use the eetop/threadRef value.
Acquiring interruptLock assures that the eetop/threadRef value won't
change during interruptImpl and isInterruptedImpl. This will prevent
crashes which happen when a stale eetop/threadRef value is used to
invoke OMR thread library functions.

Related: eclipse-openj9/openj9#19544
Related: eclipse-openj9/openj9#19598

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
@babsingh
Copy link
Contributor

babsingh commented Jun 11, 2024

This failure happened due to the fix (JDK23+) for https://bugs.openjdk.org/browse/JDK-8323782. This change moved the call to interrupt0 outside the synchronized(interruptLock) block.

Created ibmruntimes/openj9-openjdk-jdk#803 to resolve this issue. This PR acquires interruptLock for all calls of interruptImpl and isInterruptedImpl. This should also address failures previously seen in JDK19+: #16659 (comment) and #16659 (comment).

@babsingh babsingh modified the milestone: Java 23 (0.47) Jun 17, 2024
babsingh added a commit to babsingh/openj9-openjdk-jdk21 that referenced this issue Jun 18, 2024
interruptImpl and isInterruptedImpl use the eetop/threadRef value.
Acquiring interruptLock assures that the eetop/threadRef value won't
change during interruptImpl and isInterruptedImpl. This will prevent
crashes which happen when a stale eetop/threadRef value is used to
invoke OMR thread library functions.

Related: eclipse-openj9/openj9#19544
Related: eclipse-openj9/openj9#19598

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
babsingh added a commit to babsingh/openj9-openjdk-jdk21 that referenced this issue Jun 18, 2024
interruptImpl and isInterruptedImpl use the eetop/threadRef value.
Acquiring interruptLock assures that the eetop/threadRef value won't
change during interruptImpl and isInterruptedImpl. This will prevent
crashes which happen when a stale eetop/threadRef value is used to
invoke OMR thread library functions.

Related: eclipse-openj9/openj9#19544
Related: eclipse-openj9/openj9#19598

Backport of ibmruntimes/openj9-openjdk-jdk#803

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
babsingh added a commit to babsingh/openj9-openjdk-jdk22 that referenced this issue Jun 18, 2024
interruptImpl and isInterruptedImpl use the eetop/threadRef value.
Acquiring interruptLock assures that the eetop/threadRef value won't
change during interruptImpl and isInterruptedImpl. This will prevent
crashes which happen when a stale eetop/threadRef value is used to
invoke OMR thread library functions.

Related: eclipse-openj9/openj9#19544
Related: eclipse-openj9/openj9#19598

Backport of ibmruntimes/openj9-openjdk-jdk#803

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
babsingh added a commit to babsingh/openj9-openjdk-jdk21 that referenced this issue Jun 18, 2024
interruptImpl and isInterruptedImpl use the eetop/threadRef value.
Acquiring interruptLock assures that the eetop/threadRef value won't
change during interruptImpl and isInterruptedImpl. This will prevent
crashes which happen when a stale eetop/threadRef value is used to
invoke OMR thread library functions.

Related: eclipse-openj9/openj9#19544
Related: eclipse-openj9/openj9#19598

Backport of ibmruntimes/openj9-openjdk-jdk#803

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
babsingh added a commit to babsingh/openj9-openjdk-jdk23 that referenced this issue Jul 8, 2024
interruptImpl and isInterruptedImpl use the eetop/threadRef value.
Acquiring interruptLock assures that the eetop/threadRef value won't
change during interruptImpl and isInterruptedImpl. This will prevent
crashes which happen when a stale eetop/threadRef value is used to
invoke OMR thread library functions.

Related: eclipse-openj9/openj9#19544
Related: eclipse-openj9/openj9#19598
Related: eclipse-openj9/openj9#19816

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jdk23 segfault Issues that describe segfaults / JVM crashes test failure
Projects
None yet
Development

No branches or pull requests

3 participants