Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AOT jsr292Test_1 crash in 2nd iteration, vmState=0x000501ff #15059

Closed
pshipton opened this issue May 12, 2022 · 32 comments · Fixed by #15131
Closed

AOT jsr292Test_1 crash in 2nd iteration, vmState=0x000501ff #15059

pshipton opened this issue May 12, 2022 · 32 comments · Fixed by #15131

Comments

@pshipton
Copy link
Member

https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_extended.functional_aarch64_mac_aot_Personal_testList_1/12
jsr292Test_1 -Xshareclasses:name=test_aot -Xscmx400M -Xscmaxaot256m -Xjit -XX:RecreateClassfileOnload

No diagnostics captured.

06:27:28  jsr292Test_1_PASSED(ITER_1)
06:27:28  
06:27:28  [IncludeExcludeTestAnnotationTransformer] [INFO] EXCLUDE_FILE environment variable: /Users/jenkins/workspace/Test_openjdk11_j9_extended.functional_aarch64_mac_aot_Personal_testList_1/aqa-tests/TKG/../TestConfig/resources/excludes/latest_exclude_11.txt
06:27:28  [IncludeExcludeTestAnnotationTransformer] [INFO] Processing exclude file: /Users/jenkins/workspace/Test_openjdk11_j9_extended.functional_aarch64_mac_aot_Personal_testList_1/aqa-tests/TKG/../TestConfig/resources/excludes/latest_exclude_11.txt
06:27:28  ...
06:27:28  ... TestNG 6.14.2 by Cédric Beust (cedric@beust.com)
06:27:28  ...
06:27:28  
06:27:28  testCatchException.0
06:27:28  testCatchException.05
06:27:28  boolean was: true
06:27:28  testCatchException.1
06:27:28  testCatchException.2
06:27:28  testCatchException.3
06:27:28  testCatchException.4
06:27:28  testIdentity.1
06:27:28  testIdentity.2
06:27:28  testIdentity.3
06:27:28  testing asType
06:27:28  Testing implicit casting
06:27:28  	testing void
06:27:28  	testing reference
06:27:28  	testing byte
06:27:28  	testing boolean
06:27:28  	testing short
06:27:28  	testing char
06:27:28  	testing int
06:27:28  	testing long
06:27:28  	testing float
06:27:28  	testing double
06:27:28  Testing explicit casting
06:27:28  	testing void
06:27:28  	testing reference
06:27:28  	testing byte
06:27:28  	testing boolean
06:27:28  	testing short
06:27:28  	testing char
06:27:28  	testing int
06:27:28  	testing long
06:27:28  	testing float
06:27:28  	testing double
06:27:28  Unhandled exception
06:27:28  Type=Segmentation error vmState=0x000501ff
06:27:28  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000002
06:27:28  Handler1=00000001041A69F0 Handler2=0000000104372C14 InaccessibleAddress=00000003801CF9DF
06:27:28  x0=0000000000000008 x1=00000002801CF9D8 x2=00000000FFFFFFFF x3=00000001088921C4
06:27:28  x4=000000013D780CB0 x5=0000000000000035 x6=000000016BF96880 x7=0000000000000000
06:27:28  x8=0000000100000007 x9=0000000000000008 x10=000000013D780CB0 x11=00000000006AE7A0
06:27:28  x12=00000001468BB7A0 x13=0000000000000000 x14=0000000000000000 x15=0000000000000030
06:27:28  x16=00000001A10EA2A0 x17=000000020FA41648 x18=0000000000000000 x19=00000002801CF9D8
06:27:28  x20=00000000FFFFFFFF x21=00000001468A0100 x22=0000000108FCD130 x23=0000000000000001
06:27:28  x24=0000000125B2CD88 x25=000000016BF97020 x26=0000000108E19A7B x27=00000001259B5250
06:27:28  x28=0000000125B74ED0 x29(FP)=000000016BF96BC0 x30(LR)=0000000108892228 x31(SP)=000000016BF96B10
06:27:28  PC=000000010889222C SP=000000016BF96B10
06:27:28  v0 00000000000000ff (f: 255.000000, d: 1.259867e-321)
06:27:28  v1 ffffffffffffffff (f: 4294967296.000000, d: nan)
06:27:28  v2 0706050403020100 (f: 50462976.000000, d: 7.949929e-275)
06:27:28  v3 0000000125b71df0 (f: 632757760.000000, d: 2.434620e-314)
06:27:28  v4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v20 ffffffffffffffff (f: 4294967296.000000, d: nan)
06:27:28  v21 ffffffffffffffff (f: 4294967296.000000, d: nan)
06:27:28  v22 ffffffffffffffff (f: 4294967296.000000, d: nan)
06:27:28  v23 ffffffffffffffff (f: 4294967296.000000, d: nan)
06:27:28  v24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  v31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
06:27:28  Module=/Users/jenkins/workspace/Test_openjdk11_j9_extended.functional_aarch64_mac_aot_Personal_testList_1/openjdkbinary/j2sdk-image/lib/default/libj9jit29.dylib
06:27:28  Module_base_address=0000000108800000 Symbol=_ZN11TR_J9VMBase19getReferenceFieldAtEmm
06:27:28  Symbol_address=00000001088921C4
06:27:28  
06:27:28  Method_being_compiled=java/util/regex/Pattern.atom()Ljava/util/regex/Pattern$Node;
06:27:28  Target=2_90_20220512_13 (Mac OS X 11.4)
06:27:28  CPU=aarch64 (8 logical CPUs) (0x400000000 RAM)
06:27:28  ----------- Stack Backtrace -----------
06:27:28  ---------------------------------------

@knn-k fyi

@knn-k
Copy link
Contributor

knn-k commented May 13, 2022

The location of the crash in the JIT:

   92218: e0 02 3f d6   blr     x23
   9221c: c8 02 40 f9   ldr     x8, [x22]
   92220: 00 61 03 91   add     x0, x8, #216
   92224: b2 d1 ff 97   bl      0x868ec <__ZN2J911ObjectModel23objectHeaderSizeInBytesEv>
   92228: 08 00 14 8b   add     x8, x0, x20
   9222c: 60 6a 68 f8   ldr     x0, [x19, x8] <- This instruction
   92230: fd 7b 43 a9   ldp     x29, x30, [sp, #48]
   92234: f4 4f 42 a9   ldp     x20, x19, [sp, #32]
   92238: f6 57 41 a9   ldp     x22, x21, [sp, #16]
   9223c: f8 5f c4 a8   ldp     x24, x23, [sp], #64
   92240: c0 03 5f d6   ret

The offset register x8 (0x100000007) seems to be wrong, and it is due to x20 being 0x0FFFFFFFF.

I cannot find the symbol getReferenceFieldAtEmm in the JIT source code, however. Where is it?

-> (Correction) The symbol is TR_J9VMBase::getReferenceFieldAt(). The value in x20 comes from the parameter fieldOffset of this function.

@knn-k
Copy link
Contributor

knn-k commented May 13, 2022

Reproduced in a Grinder job: https://openj9-jenkins.osuosl.org/job/Grinder_testList_0/110/

@knn-k
Copy link
Contributor

knn-k commented May 13, 2022

I cannot reproduce the problem locally using the binary from https://openj9-jenkins.osuosl.org/job/Build_JDK11_aarch64_mac_aot_Personal/13/ .

What is special about the JDK11_aarch64_mac_aot configuration?

@knn-k
Copy link
Contributor

knn-k commented May 13, 2022

I reproduced the segmentation fault by running with TEST_FLAG=AOT.

@knn-k
Copy link
Contributor

knn-k commented May 17, 2022

I ran the testcase in the debugger, and got the following call stack:

(lldb) bt
* thread #5, stop reason = EXC_BAD_ACCESS (code=1, address=0x3801db61f)
  * frame #0: 0x000000010592f1b4 libj9jit29.dylib`TR_J9VMBase::getReferenceFieldAt(this=<unavailable>, objectPointer=10739365400, fieldOffset=4294967295) at VMJ9.cpp:1210:22 [opt]
    frame #1: 0x000000010593520c libj9jit29.dylib`TR_J9VMBase::lookupMethodHandleThunkArchetype(this=0x0000000100217860, methodHandle=10738706064) at VMJ9.cpp:4548:41 [opt]
    frame #2: 0x0000000105935644 libj9jit29.dylib`TR_J9VMBase::createMethodHandleArchetypeSpecimen(this=0x0000000100217860, trMemory=0x00000001700259a8, methodHandleLocation=0x000000011124bcd0, owningMethod=0x00000001349ccd30) at VMJ9.cpp:4608:38 [opt]
    frame #3: 0x0000000105a37524 libj9jit29.dylib`InterpreterEmulator::visitInvokedynamic(this=0x000000017001eee0) at InterpreterEmulator.cpp:1276:44 [opt]
    frame #4: 0x0000000105a37380 libj9jit29.dylib`InterpreterEmulator::findAndCreateCallsitesFromBytecodes(this=0x000000017001eee0, wasPeekingSuccessfull=<unavailable>, withState=<unavailable>) at InterpreterEmulator.cpp:1180:34 [opt]
    frame #5: 0x0000000105a2e1b0 libj9jit29.dylib`TR_J9EstimateCodeSize::realEstimateCodeSize(this=0x0000000134855220, calltarget=<unavailable>, prevCallStack=<unavailable>, recurseDown=true, cfgRegion=<unavailable>) at J9EstimateCodeSize.cpp:1335:16 [opt]
    frame #6: 0x0000000105a2d77c libj9jit29.dylib`TR_J9EstimateCodeSize::estimateCodeSize(this=0x0000000134855220, calltarget=<unavailable>, prevCallStack=<unavailable>, recurseDown=<unavailable>) at J9EstimateCodeSize.cpp:417:8 [opt]

It is TR_J9SharedCacheVM::getInstanceFieldOffset() that returns 0x0FFFFFFFF as the offset value. The type of getInstanceFieldOffset() is uint32_t while getReferenceFieldAt() takes the offset of uintptr_t.

TR_J9VMBase::methodHandle_thunkableSignature() fails to find the thunkableSignature field.

@knn-k
Copy link
Contributor

knn-k commented May 17, 2022

#13096 is an old AArch64 issue that failed in methodHandle_thunkableSignature() -> getReferenceField() -> getReferenceFieldAt().
But the old issue failed in getReferenceFieldAt() for the thunks field, instead of the thunkableSignature field in this issue.

@knn-k
Copy link
Contributor

knn-k commented May 18, 2022

I reverted the change from PR #15037 locally, and it seems to resolve the crash on AArch64 macOS. No failure in 50x runs. (vs 3 failures out of 10x runs with #15037 enabled)

I have no idea how #15037 relates to the crash in looking up the thunkableSignature field.
I also wonder why we see failures only on AArch64 macOS as #15037 looks like a platform-independent change.

FYI. @dsouzai

@dsouzai
Copy link
Contributor

dsouzai commented May 18, 2022

Because the crashing compile seems to be dealing with method handles, it must not be an AOT compile (though it would be good to confirm this). As such, I don't know what the change in #15037 (an AOT compile specific change) has to do with a non-AOT compilation.

The test is running with -XX:RecreateClassfileOnload, so that sounds like the kind of thing that would cause a difference in behaviour with the change in 15037. If I was to guess, it's possible that there's more class chain invalidations now, and so the timing changed enough to expose an existing issue.

@pshipton
Copy link
Member Author

https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.functional_aarch64_mac_aot_Personal_testList_1/14
JCL_Test_SM_1

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk11_j9_sanity.functional_aarch64_mac_aot_Personal_testList_1/14/functional_test_output.tar.gz

JCL_Test_SM_1_PASSED(ITER_1)

WARNING: package com.ibm.jit.crypto not in java.base
...
... TestNG 6.14.2 by Cédric Beust (cedric@beust.com)
...

PASSED: test_getParent1
PASSED: test_getParent2
PASSED: test_getParent3
PASSED: test_getParent4

===============================================
    JCL_TEST_Java-Lang_ClassLoader_SM
    Tests run: 4, Failures: 0, Skips: 0
===============================================

[Support_Exec] [INFO] ***11.0.16-internal+0-adhoc.jenkins.BuildJDK11aarch64macaotPersonal
JRE 11 Mac OS X aarch64-64-Bit 20220519_14 (JIT enabled, AOT enabled)
OpenJ9   - 57daad735b0
OMR      - 42677ec346e
JCL      - 216af100725 based on jdk-11.0.16+2***
Unhandled exception
Type=Segmentation error vmState=0x000501ff
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000002
Handler1=00000001050229C0 Handler2=0000000100DFAC14 InaccessibleAddress=000000038013F1A7
x0=0000000000000008 x1=000000028013F1A0 x2=00000000FFFFFFFF x3=0000000105331980
x4=000000014EEF4D04 x5=0000000000000038 x6=000000016F352880 x7=0000000000000000
x8=0000000100000007 x9=0000000000000008 x10=000000014EEF4D04 x11=0000000000661690
x12=000000012D0509A0 x13=0000000000000000 x14=0000000000000000 x15=0000000000000030
x16=0000000188D1E2A0 x17=00000001F77A1648 x18=0000000000000000 x19=000000028013F1A0
x20=00000000FFFFFFFF x21=000000012C8AA500 x22=0000000105A6D140 x23=0000000000000001
x24=000000012276CD88 x25=000000016F353020 x26=00000001058B996B x27=00000001225F5250
x28=00000001227B4ED0 x29(FP)=000000016F352BC0 x30(LR)=00000001053319E4 x31(SP)=000000016F352B10
PC=00000001053319E8 SP=000000016F352B10
v0 00000000000000ff (f: 255.000000, d: 1.259867e-321)
v1 ffffffffffffffff (f: 4294967296.000000, d: nan)
v2 0706050403020100 (f: 50462976.000000, d: 7.949929e-275)
v3 00000001227b1df0 (f: 578493952.000000, d: 2.407810e-314)
v4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v20 ffffffffffffffff (f: 4294967296.000000, d: nan)
v21 ffffffffffffffff (f: 4294967296.000000, d: nan)
v22 ffffffffffffffff (f: 4294967296.000000, d: nan)
v23 ffffffffffffffff (f: 4294967296.000000, d: nan)
v24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/Users/jenkins/workspace/Test_openjdk11_j9_sanity.functional_aarch64_mac_aot_Personal_testList_1/openjdkbinary/j2sdk-image/lib/default/libj9jit29.dylib
Module_base_address=00000001052A0000 Symbol=_ZN11TR_J9VMBase19getReferenceFieldAtEmm
Symbol_address=0000000105331980

Method_being_compiled=java/util/regex/Pattern.atom()Ljava/util/regex/Pattern$Node;
Target=2_90_20220519_14 (Mac OS X 11.5.2)
CPU=aarch64 (8 logical CPUs) (0x400000000 RAM)
----------- Stack Backtrace -----------
---------------------------------------

@knn-k
Copy link
Contributor

knn-k commented May 24, 2022

The native call stack following estimateCodeSize() looks like this. It is not an AOT compilation.

    frame #6: 0x000000010622d004 libj9jit29.dylib`TR_J9EstimateCodeSize::estimateCodeSize(this=0x0000000134859220, calltarget=<unavailable>, prevCallStack=<unavailable>, recurseDown=<unavailable>) at J9EstimateCodeSize.cpp:417:8 [opt]
    frame #7: 0x000000010622e594 libj9jit29.dylib`TR_J9EstimateCodeSize::realEstimateCodeSize(this=0x0000000134859220, calltarget=<unavailable>, prevCallStack=<unavailable>, recurseDown=false, cfgRegion=<unavailable>) at J9EstimateCodeSize.cpp:1556:39 [opt]
    frame #8: 0x000000010622d004 libj9jit29.dylib`TR_J9EstimateCodeSize::estimateCodeSize(this=0x0000000134859220, calltarget=<unavailable>, prevCallStack=<unavailable>, recurseDown=<unavailable>) at J9EstimateCodeSize.cpp:417:8 [opt]
    frame #9: 0x00000001061a797c libj9jit29.dylib`TR_EstimateCodeSize::calculateCodeSize(this=0x0000000134859220, calltarget=0x0000000134993c20, callStack=0x00000001700217e0, recurseDown=true) at EstimateCodeSize.cpp:92:13 [opt]
    frame #10: 0x000000010621e98c libj9jit29.dylib`TR_MultipleCallTargetInliner::weighCallSite(this=<unavailable>, callStack=<unavailable>, callsite=0x00000001348f0310, currentBlockHasExceptionSuccessors=false, dontAddCalls=false) at InlinerTempForJ9.cpp:3576:31 [opt]
    frame #11: 0x0000000106220520 libj9jit29.dylib`TR_MultipleCallTargetInliner::inlineCallTargets(this=0x0000000170022978, callerSymbol=0x0000000134825240, prevCallStack=0x0000000000000000, innerPrexInfo=<unavailable>) at InlinerTempForJ9.cpp:3297:25 [opt]
    frame #12: 0x0000000106434894 libj9jit29.dylib`TR_InlinerBase::performInlining(this=0x0000000170022978, callerSymbol=0x0000000134825240) at Inliner.cpp:453:23 [opt]
    frame #13: 0x000000010621cc54 libj9jit29.dylib`TR_Inliner::perform(this=0x0000000134830600) at InlinerTempForJ9.cpp:2560:15 [opt]
    frame #14: 0x0000000106501c18 libj9jit29.dylib`OMR::Optimizer::performOptimization(this=0x00000001348b07d0, optimization=0x00000001348215ac, firstOptIndex=<unavailable>, lastOptIndex=<unavailable>, doTiming=<unavailable>) at OMROptimizer.cpp:2053:29 [opt]
    frame #15: 0x00000001064ffea0 libj9jit29.dylib`OMR::Optimizer::optimize(this=0x00000001348b07d0) at OMROptimizer.cpp:1128:28 [opt]
    frame #16: 0x0000000106374210 libj9jit29.dylib`OMR::Compilation::compile() [inlined] OMR::Compilation::performOptimizations(this=<unavailable>) at OMRCompilation.cpp:1267:19 [opt]
    frame #17: 0x00000001063741f4 libj9jit29.dylib`OMR::Compilation::compile(this=0x0000000134820000) at OMRCompilation.cpp:1064 [opt]
    frame #18: 0x00000001060eee28 libj9jit29.dylib`TR::CompilationInfoPerThreadBase::compile(this=0x0000000110991030, vmThread=<unavailable>, compiler=0x0000000134820000, compilee=0x0000000170025b48, vm=0x000000010041e4d0, optimizationPlan=0x0000000110a61350, scratchSegmentProvider=<unavailable>) at CompilationThread.cpp:9583:26 [opt]
    frame #19: 0x00000001060eca90 libj9jit29.dylib`TR::CompilationInfoPerThreadBase::wrappedCompile(portLib=<unavailable>, opaqueParameters=0x0000000170025958) at CompilationThread.cpp:9077:24 [opt]
    frame #20: 0x00000001006e6de0 libj9prt29.dylib`omrsig_protect(portLibrary=0x00000001001cdde8, fn=(libj9jit29.dylib`TR::CompilationInfoPerThreadBase::wrappedCompile(J9PortLibrary*, void*) at CompilationThread.cpp:8135), fn_arg=0x0000000170025958, handler=(libj9jit29.dylib`jitSignalHandler(J9PortLibrary*, unsigned int, void*, void*) at CompilationThread.cpp:205), handler_arg=0x0000000105887f00, flags=505, result=<unavailable>) at omrsignal.c:425:12 [opt]
    frame #21: 0x00000001060e8050 libj9jit29.dylib`TR::CompilationInfoPerThreadBase::compile(this=0x0000000110991030, vmThread=0x0000000105887f00, entry=0x000000010042fa80, scratchSegmentProvider=<unavailable>) at CompilationThread.cpp:8066:31 [opt]
    frame #22: 0x00000001060e7760 libj9jit29.dylib`TR::CompilationInfoPerThread::processEntry(this=0x0000000110991030, entry=0x000000010042fa80, scratchSegmentProvider=0x0000000170026ca0) at CompilationThread.cpp:4394:20 [opt]
    frame #23: 0x00000001060e6adc libj9jit29.dylib`TR::CompilationInfoPerThread::processEntries(this=0x0000000110991030) at CompilationThread.cpp:4099:13 [opt]
    frame #24: 0x00000001060e689c libj9jit29.dylib`TR::CompilationInfoPerThread::run(this=0x0000000110991030) at CompilationThread.cpp:3944:13 [opt]
    frame #25: 0x00000001060e6694 libj9jit29.dylib`protectedCompilationThreadProc((null)=<unavailable>, compInfoPT=0x0000000110991030) at CompilationThread.cpp:3878:16 [opt]
    frame #26: 0x00000001006e6de0 libj9prt29.dylib`omrsig_protect(portLibrary=0x00000001001cdde8, fn=(libj9jit29.dylib`protectedCompilationThreadProc(J9PortLibrary*, TR::CompilationInfoPerThread*) at CompilationThread.cpp:3805), fn_arg=0x0000000110991030, handler=(libj9vm29.dylib`structuredSignalHandler at gphandle.c:698), handler_arg=0x0000000105887f00, flags=506, result=<unavailable>) at omrsignal.c:425:12 [opt]
    frame #27: 0x00000001060e4b68 libj9jit29.dylib`compilationThreadProc(entryarg=0x0000000110991030) at CompilationThread.cpp:3783:25 [opt]
    frame #28: 0x00000001007626e8 libj9thr29.dylib`thread_wrapper(arg=0x0000000101013460) at omrthread.c:1733:2 [opt]
    frame #29: 0x000000018d69b878 libsystem_pthread.dylib`_pthread_start + 320

@knn-k
Copy link
Contributor

knn-k commented May 24, 2022

jdmpview shows the following. The object at 0x02801BA320 looks like a correct ThunkTuple instance that has the thunkableSignature field.

> !j9object 0x02801BA320
!J9Object 0x00000002801BA320 {
        struct J9Class* clazz = !j9class 0x14808FB00 // java/lang/invoke/ThunkTuple
        Object flags = 0x00000000;
        Ljava/lang/String; thunkableSignature = !fj9object 0x2801ba3f0 (offset = 24) (java/lang/invoke/ThunkTuple)
        I invocationCount = 0x000003B7 (offset = 32) (java/lang/invoke/ThunkTuple)
        J invokeExactThunk = 0x0000000113A2476C (offset = 0) (java/lang/invoke/ThunkTuple)
        J i2jInvokeExactThunk = 0x0000000113A24764 (offset = 8) (java/lang/invoke/ThunkTuple)
        J finalizeLink = 0x00000002801BA518 (offset = 16) (java/lang/invoke/ThunkTuple) <hidden>
}
> !fj9object 0x2801ba3f0
J9VMJavaLangString at 0x00000002801BA3F0 {
struct J9Class* clazz = !j9class 0x14801D100 // java/lang/String
Object flags = 0x00000000;
[B value = !fj9object 0x2801c33b8 (offset = 0) (java/lang/String)
B coder = 0x00000001 (offset = 8) (java/lang/String)
I hash = 0x00000000 (offset = 12) (java/lang/String)
"(I)Ljava/lang/Object;"
}
> !j9classshape 0x14808FB00
Instance fields in java/lang/invoke/ThunkTuple:

offset     name signature       (declaring class)
24      thunkableSignature      Ljava/lang/String;      (java/lang/invoke/ThunkTuple)
32      invocationCount I       (java/lang/invoke/ThunkTuple)
0       invokeExactThunk        J       (java/lang/invoke/ThunkTuple)
8       i2jInvokeExactThunk     J       (java/lang/invoke/ThunkTuple)
16      finalizeLink    J       (java/lang/invoke/ThunkTuple) <hidden>

Total instance size: 40

The next question is: Why does getInstanceFieldOffset() fail to find the thunkableSignature field?

@knn-k
Copy link
Contributor

knn-k commented May 25, 2022

I found I was able to reproduce the crash with jsr292Test_1 on AArch64 Linux by running with TEST_FLAG=AOT.
Now I assume this is not a macOS-specific problem.

TR_J9SharedCacheVM::getInstanceFieldOffset() calls TR_ResolvedRelocatableJ9Method::validateArbitraryClass(), and it returns false. That is the reason why TR_J9SharedCacheVM::getInstanceFieldOffset() returns ~0.

@knn-k knn-k removed the os:macos label May 25, 2022
@pshipton
Copy link
Member Author

pshipton commented May 25, 2022

We are short on aarch64 linux machines so we don't run the AOT tests on this platform. We test amac, plinux, xlinux, zlinux.

@knn-k
Copy link
Contributor

knn-k commented May 25, 2022

TR_ResolvedRelocatableJ9Method::storeValidationRecordIfNecessary() calls fej9->sharedCache()->rememberClass(), and it returns a NULL for the ThunkTuple class in question.

classChain = fej9->sharedCache()->rememberClass(definingClass);

TR_J9VMBase::methodHandle_thunkableSignature()
↓
TR_J9VMBase::getReferenceField()
↓
TR_J9SharedCacheVM::getInstanceFieldOffset()
↓
TR_ResolvedRelocatableJ9Method::validateArbitraryClass()
↓
TR_ResolvedRelocatableJ9Method::storeValidationRecordIfNecessary()
↓
fej9->sharedCache()->rememberClass() -- This returns a NULL for ThunkTuple

I guess PR #15037 may have changed the result from fej9->sharedCache()->rememberClass(). @dsouzai

@knn-k
Copy link
Contributor

knn-k commented May 25, 2022

TR_J9VMBase::getInstanceFieldOffset() and TR_J9SharedCacheVM::getInstanceFieldOffset() can return ~0 as the error code when it failed to find the field offset.
On the other hand, TR_J9VMBase::getReferenceField() does not handle the error case, and it resulted in the crash in this PR.

uintptr_t getReferenceField(uintptr_t objectPointer, char *fieldName, char *fieldSignature)
{
return getReferenceFieldAt(objectPointer, getInstanceFieldOffset(getObjectClass(objectPointer), fieldName, fieldSignature));
}

Other functions that call getInstanceFieldOffset() don't seem to care about the error case, either.

@Akira1Saitoh
Copy link
Contributor

I found I was able to reproduce the crash with jsr292Test_1 on AArch64 Linux by running with TEST_FLAG=AOT. Now I assume this is not a macOS-specific problem.

TR_J9SharedCacheVM::getInstanceFieldOffset() calls TR_ResolvedRelocatableJ9Method::validateArbitraryClass(), and it returns false. That is the reason why TR_J9SharedCacheVM::getInstanceFieldOffset() returns ~0.

I think it is an AOT compilation because the VM instance is TR_J9SharedCacheVM. Am I right? But there are method handles in the compiled method?

@knn-k
Copy link
Contributor

knn-k commented May 25, 2022

Correction: Yesterday I wrote that it was not an AOT compilation in my comment #15059 (comment) . That was wrong.

I made sure the crash is caused by an AOT compilation by using -Xjit:verbose={compileStart}. The JIT verbose log shows the following line at the end, and the method name appears in the sighandler output as Method_being_compiled.

 (AOT warm) Compiling java/util/regex/Pattern.sequence(Ljava/util/regex/Pattern$Node;)Ljava/util/regex/Pattern$Node;  JitDumpMethod j9m=0000000141C0B490  t=14131 compThreadID=7 memLimit=4194303 KB freePhysicalMemory=1607 MB

@dsouzai
Copy link
Contributor

dsouzai commented May 25, 2022

I guess the confusing thing here is how we got to the point where we're running the InterpreterEmulator. Normally in Walker.cpp there's a bunch of aot exceptions for when the ILGenerator comes across MH related bytecodes. However, here we're not using the ILGenerator, but the Interpreter Emulator.

@hzongaro do we always use the IntEmulator when estimating the code size? If so, how do we ensure that we don't start looking at MH related bytecodes during an AOT compile? From the looks of it, there don't seem to be any guards for this, but at the same time this only happens on aarch64.

@hzongaro
Copy link
Member

do we always use the IntEmulator when estimating the code size? If so, how do we ensure that we don't start looking at MH related bytecodes during an AOT compile?

Sorry, Irwin - I haven't looked at the InterpreterEmulator before, so I'm not able to quickly answer your questions. Perhaps Nazim @nbhuiyan would be better able to answer?

@nbhuiyan
Copy link
Member

@dsouzai

do we always use the IntEmulator when estimating the code size?

Yes

If so, how do we ensure that we don't start looking at MH related bytecodes during an AOT compile? From the looks of it, there don't seem to be any guards for this

For the OpenJDK MH implementation at least, we do have checks in place to abort during AOT for both invokehandle and invokedynamic but they are guarded within J9VM_OPT_OPENJDK_METHODHANDLE. Only invokedynamic bytecode visitor is used for both old and new MH implementations, and there was no existing checks in place in that function to abort for AOT compilations. If we were supposed to perform a check for AOT compilation when we land in the invokedynamic visitor for the old MH implementation, then we simply just did not do that and may have proceeded with the callsite creation if the callsite table entry corresponding to that bytecode was resolved.

@jdmpapin
Copy link
Contributor

visitInvokedynamic() only calls createMethodHandleArchetypeSpecimen() when a known object table exists. I was under the impression that AOT compilations would not have a known object table. It looks like the TR_DisableKnownObjectTable option is what prevents it from being created:

TR::KnownObjectTable *
J9::Compilation::getOrCreateKnownObjectTable()
{
if (!_knownObjectTable && !self()->getOption(TR_DisableKnownObjectTable))
{
_knownObjectTable = new (self()->trHeapMemory()) TR::KnownObjectTable(self());
}
return _knownObjectTable;
}
I can see that this option is set when running with SVM:
if (vm->canUseSymbolValidationManager() && options->getOption(TR_EnableSymbolValidationManager))
{
options->setOption(TR_UseSymbolValidationManager);
options->setOption(TR_DisableKnownObjectTable);
}
But I don't see where this is set for AOT without SVM

@knn-k
Copy link
Contributor

knn-k commented May 26, 2022

SVM is not enabled on AArch64. This is the reason why TR_J9SharedCacheVM::getInstanceFieldOffset() takes the non-SVM path.

#if (defined(TR_HOST_X86) || defined(TR_HOST_S390) || defined(TR_HOST_POWER)) && defined(TR_TARGET_64BIT)
self()->setOption(TR_EnableSymbolValidationManager);
#endif

@knn-k
Copy link
Contributor

knn-k commented May 26, 2022

I opened PR #15121 for enabling SVM on AArch64.
I applied it locally and ran jsr292Test_1 more than 30 times. It seems to work at least for the testcase.

@jdmpapin
Copy link
Contributor

We should probably also set this option in non-SVM AOT. Do you agree, @dsouzai?

@dsouzai
Copy link
Contributor

dsouzai commented May 26, 2022

We should probably also set this option in non-SVM AOT. Do you agree, @dsouzai?

Yeah I guess we should hoist the code that sets TR_DisableKnownObjectTable and do if we're doing a relocatable compile regardless of whether the SVM is enabled or not.

knn-k added a commit to knn-k/openj9 that referenced this issue May 27, 2022
This commit disables known object table in AOT compilation.

Fixes: eclipse-openj9#15059

Signed-off-by: KONNO Kazuhiro <konno@jp.ibm.com>
@knn-k
Copy link
Contributor

knn-k commented May 27, 2022

I opened PR #15131 for disabling known object table in AOT compilation, and started test jobs.
The fix works fine for jsr292Test_1 in my local testing.

@knn-k
Copy link
Contributor

knn-k commented May 27, 2022

Test jobs on AArch64 Linux/macOS in #15131 finished successfully.

@pshipton
Copy link
Member Author

Re-opening and assigning to the 0.33 milestone as per #15131 (comment)

@knn-k
Copy link
Contributor

knn-k commented May 27, 2022

I opened #15138 for v0.33.

@dsouzai
Copy link
Contributor

dsouzai commented May 30, 2022

Closing as I just merged #15138

@dsouzai dsouzai closed this as completed May 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants