Skip to content

[unittests][ExecutionEngine][Orc] The JITLinkRedirectionManagerTest test case segfaults on older linux distributions #119404

@pawosm-arm

Description

@pawosm-arm

Not sure how to raise this issue properly. The RHEL8 distro is quite old nowadays (yet has all the GCC, CMake, Python things in sufficient versions to build LLVM), it is still widely used, and there is a need for building LLVM for it to the point all of the check-all tests are passing. And most of them do. One of the failing ones is ExecutionEngine/Orc/./OrcJITTests/80/82. It fails as such:

Note: This is test shard 81 of 82.
[==========] Running 2 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 1 test from ObjectLinkingLayerTest
[ RUN      ] ObjectLinkingLayerTest.ClaimLateDefinedWeakSymbols
[       OK ] ObjectLinkingLayerTest.ClaimLateDefinedWeakSymbols (0 ms)
[----------] 1 test from ObjectLinkingLayerTest (0 ms total)

[----------] 1 test from JITLinkRedirectionManagerTest
[ RUN      ] JITLinkRedirectionManagerTest.BasicRedirectionOperation
 #0 0x0000000000e0e990 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (toolchain/build/stage/bootstrap_compiler/unittests/ExecutionEngine/Orc/./OrcJITTests+0xe0e990)
 #1 0x0000000000e0c8a0 llvm::sys::RunSignalHandlers() (toolchain/build/stage/bootstrap_compiler/unittests/ExecutionEngine/Orc/./OrcJITTests+0xe0c8a0)
 #2 0x0000000000e0c9ec SignalHandler(int) Signals.cpp:0:0
 #3 0x0000ffff95af07a0 (linux-vdso.so.1+0x7a0)
 #4 0x0000000000cbaf3c llvm::orc::InProcessMemoryAccess::writePointersAsync(llvm::ArrayRef<llvm::orc::tpctypes::PointerWrite>, llvm::unique_function<void (llvm::Error)>) (toolchain/build/stage/bootstrap_compiler/unittests/ExecutionEngine/Orc/./OrcJITTests+0xcbaf3c)
 #5 0x0000000000cbff38 llvm::orc::JITLinkRedirectableSymbolManager::redirect(llvm::orc::JITDylib&, llvm::DenseMap<llvm::orc::SymbolStringPtr, llvm::orc::ExecutorSymbolDef, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>, llvm::detail::DenseMapPair<llvm::orc::SymbolStringPtr, llvm::orc::ExecutorSymbolDef>> const&) (toolchain/build/stage/bootstrap_compiler/unittests/ExecutionEngine/Orc/./OrcJITTests+0xcbff38)
 #6 0x0000000000802d48 JITLinkRedirectionManagerTest_BasicRedirectionOperation_Test::TestBody() (toolchain/build/stage/bootstrap_compiler/unittests/ExecutionEngine/Orc/./OrcJITTests+0x802d48)
 #7 0x00000000011b7dfc testing::Test::Run() (.part.1161) gtest-all.cc:0:0
 #8 0x00000000011c7ca4 testing::TestInfo::Run() (toolchain/build/stage/bootstrap_compiler/unittests/ExecutionEngine/Orc/./OrcJITTests+0x11c7ca4)
 #9 0x00000000011cce34 testing::TestSuite::Run() (.part.1163) gtest-all.cc:0:0
#10 0x00000000011cd5ec testing::internal::UnitTestImpl::RunAllTests() (toolchain/build/stage/bootstrap_compiler/unittests/ExecutionEngine/Orc/./OrcJITTests+0x11cd5ec)
#11 0x00000000011cdc84 testing::UnitTest::Run() (toolchain/build/stage/bootstrap_compiler/unittests/ExecutionEngine/Orc/./OrcJITTests+0x11cdc84)
#12 0x000000000070e998 main (toolchain/build/stage/bootstrap_compiler/unittests/ExecutionEngine/Orc/./OrcJITTests+0x70e998)
#13 0x0000ffff955f0d64 __libc_start_main (/lib64/libc.so.6+0x20d64)
#14 0x0000000000757770 _start (toolchain/build/stage/bootstrap_compiler/unittests/ExecutionEngine/Orc/./OrcJITTests+0x757770)

--
exit: -11
--
shard JSON output does not exist: toolchain/build/stage/bootstrap_compiler/unittests/ExecutionEngine/Orc/./OrcJITTests-LLVM-Unit-2503282-80-82.json
********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..

1 warning(s) in tests
********************
Failed Tests (1):
  LLVM-Unit :: ExecutionEngine/Orc/./OrcJITTests/80/82


Testing Time: 230.30s

Total Discovered Tests: 111932
  Skipped          :   209 (0.19%)
  Unsupported      : 40103 (35.83%)
  Passed           : 71547 (63.92%)
  Expectedly Failed:    72 (0.06%)
  Failed           :     1 (0.00%)
FAILED: CMakeFiles/check-all

I've started OrcJITTests in gdb:

[----------] 1 test from JITLinkRedirectionManagerTest
[ RUN      ] JITLinkRedirectionManagerTest.BasicRedirectionOperation

Thread 1 "OrcJITTests" received signal SIGSEGV, Segmentation fault.
0x0000aaaad29ad650 in llvm::orc::InProcessMemoryAccess::writePointersAsync(llvm::ArrayRef<llvm::orc::tpctypes::PointerWrite>, llvm::unique_function<void (llvm::Error)>) ()
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-251.el8_10.5.aarch64
(gdb) bt
#0  0x0000aaaad29ad650 in llvm::orc::InProcessMemoryAccess::writePointersAsync(llvm::ArrayRef<llvm::orc::tpctypes::PointerWrite>, llvm::unique_function<void (llvm::Error)>) ()
#1  0x0000aaaad247042c in llvm::orc::ExecutorProcessControl::MemoryAccess::writePointers(llvm::ArrayRef<llvm::orc::tpctypes::PointerWrite>) ()
#2  0x0000aaaad29b0d00 in llvm::orc::JITLinkRedirectableSymbolManager::redirect(llvm::orc::JITDylib&, llvm::DenseMap<llvm::orc::SymbolStringPtr, llvm::orc::ExecutorSymbolDef, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>, llvm::detail::DenseMapPair<llvm::orc::SymbolStringPtr, llvm::orc::ExecutorSymbolDef> > const&) ()
#3  0x0000aaaad24c3ff4 in JITLinkRedirectionManagerTest_BasicRedirectionOperation_Test::TestBody() ()
#4  0x0000aaaad2e31fa0 in testing::Test::Run() ()
#5  0x0000aaaad2e333f4 in testing::TestInfo::Run() ()
#6  0x0000aaaad2e340b8 in testing::TestSuite::Run() ()
#7  0x0000aaaad2e43f20 in testing::internal::UnitTestImpl::RunAllTests() ()
#8  0x0000aaaad2e438c4 in testing::UnitTest::Run() ()
#9  0x0000aaaad2e18ccc in main ()
(gdb)

The llvm::orc::InProcessMemoryAccess::writePointersAsync isn't very long:

void InProcessMemoryAccess::writePointersAsync(
    ArrayRef<tpctypes::PointerWrite> Ws, WriteResultFn OnWriteComplete) {
  if (IsArch64Bit) {
    for (auto &W : Ws)
      *W.Addr.toPtr<uint64_t *>() = W.Value.getValue();
  } else {
    for (auto &W : Ws)
      *W.Addr.toPtr<uint32_t *>() = static_cast<uint32_t>(W.Value.getValue());
  }

  OnWriteComplete(Error::success());
}

The failing instruction on AArch64 is marked with =>:

Dump of assembler code for function _ZN4llvm3orc21InProcessMemoryAccess18writePointersAsyncENS_8ArrayRefINS0_8tpctypes12PointerWriteEEENS_15unique_functionIFvNS_5ErrorEEEE:
   0x0000aaaad29ad620 <+0>:     sub     sp, sp, #0x20
   0x0000aaaad29ad624 <+4>:     stp     x29, x30, [sp, #16]
   0x0000aaaad29ad628 <+8>:     add     x29, sp, #0x10
   0x0000aaaad29ad62c <+12>:    ldrb    w8, [x0, #8]
   0x0000aaaad29ad630 <+16>:    cmp     w8, #0x1
   0x0000aaaad29ad634 <+20>:    b.ne    0xaaaad29ad65c <_ZN4llvm3orc21InProcessMemoryAccess18writePointersAsyncENS_8ArrayRefINS0_8tpctypes12PointerWriteEEENS_15unique_functionIFvNS_5ErrorEEEE+60>  // b.any
   0x0000aaaad29ad638 <+24>:    cbz     x2, 0xaaaad29ad6c4 <_ZN4llvm3orc21InProcessMemoryAccess18writePointersAsyncENS_8ArrayRefINS0_8tpctypes12PointerWriteEEENS_15unique_functionIFvNS_5ErrorEEEE+164>
   0x0000aaaad29ad63c <+28>:    lsl     x8, x2, #4
   0x0000aaaad29ad640 <+32>:    add     x9, x1, #0x8
   0x0000aaaad29ad644 <+36>:    ldp     x11, x10, [x9, #-8]
   0x0000aaaad29ad648 <+40>:    subs    x8, x8, #0x10
   0x0000aaaad29ad64c <+44>:    add     x9, x9, #0x10
=> 0x0000aaaad29ad650 <+48>:    str     x10, [x11]

Where x11 holds non-null value 0xffffba960000.

I've tried two different RHEL8 machines and an RHEL8 docker containers.
EDIT: my initial observation on docker containers was wrong. More systematic observation led me to the following conclusion:

  • this test is failing if the RHEL8 docker container is started in RHEL8 or older (e.g. RHEL7) host system
  • this test is passing (!) if the RHEL8 docker is started in a newer (e.g. Ubuntu 22.04) host system (on which itself this test is passing)
    As such, considering that docker containers are utilizing the host OS kernel, it seems that this test relies on a Linux kernel version, which must be new enough for it to pass.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions