Skip to content

segfault error 4 in libcoreclr.so #42885

Closed
@gregolsky

Description

@gregolsky

Description

Process crashes on a regular basis (few times a week).

Raven.Server[19552]: segfault at 7fb366943030 ip 00007fb6d0863a7a sp 00007fb6cdd93ea0 error 4 in libcoreclr.so[7fb6d0794000+2f0000]

We were able to take a coredump and this is what we got from lldb-3.9 on thread 19552:

(lldb) register read
General Purpose Registers:
       rax = 0x00000000012afc00
       rbx = 0x0000000001222da0
       rcx = 0x00007f8a77555720
       rdx = 0x0000000000000001
       rdi = 0x0000000001222da0
       rsi = 0x00007fb6cdd93f28
       rbp = 0x00007fb6cdd93f00
       rsp = 0x00007fb6cdd93ea0
        r8 = 0x0000000000000015
        r9 = 0x00007fb660206e48
       r10 = 0x0000060c183060c1
       r11 = 0x0000000000000000
       r12 = 0x0000000001222da0
       r13 = 0x0000000000000000
       r14 = 0x00007fb366943030
       r15 = 0x0000000000000001
       rip = 0x00007fb6d0863a7a  libcoreclr.so`SVR::gc_heap::mark_object_simple(unsigned char**, int) + 26
    rflags = 0x0000000000010206
        cs = 0x0000000000000033
        fs = 0x0000000000000000
        gs = 0x0000000000000000
        ss = 0x000000000000002b
        ds = 0x0000000000000000
        es = 0x0000000000000000

(lldb) bt
* thread #1: tid = 19552, 0x00007fb6d0863a7a libcoreclr.so`SVR::gc_heap::mark_object_simple(unsigned char**, int) + 26, name = 'Raven.Server', stop reason = signal SIGSEGV
  * frame #0: 0x00007fb6d0863a7a libcoreclr.so`SVR::gc_heap::mark_object_simple(unsigned char**, int) + 26
    frame #1: 0x00007fb6d0866d73 libcoreclr.so`SVR::GCHeap::Promote(Object**, ScanContext*, unsigned int) + 355
    frame #2: 0x00007fb6d078d21e libcoreclr.so`GcInfoDecoder::ReportUntrackedSlots(GcSlotDecoder&, REGDISPLAY*, unsigned int, void (*)(void*, Object**, unsigned int), void*) + 238
    frame #3: 0x00007fb6d078b816 libcoreclr.so`GcInfoDecoder::EnumerateLiveSlots(REGDISPLAY*, bool, unsigned int, void (*)(void*, Object**, unsigned int), void*) + 2598
    frame #4: 0x00007fb6d0618c43 libcoreclr.so`EECodeManager::EnumGcRefs(REGDISPLAY*, EECodeInfo*, unsigned int, void (*)(void*, Object**, unsigned int), void*, unsigned int) + 291
    frame #5: 0x00007fb6d06f43ce libcoreclr.so`GcStackCrawlCallBack(CrawlFrame*, void*) + 590
    frame #6: 0x00007fb6d06835ed libcoreclr.so`Thread::MakeStackwalkerCallback(CrawlFrame*, StackWalkAction (*)(CrawlFrame*, void*), void*) + 157
    frame #7: 0x00007fb6d0683821 libcoreclr.so`Thread::StackWalkFramesEx(REGDISPLAY*, StackWalkAction (*)(CrawlFrame*, void*), void*, unsigned int, Frame*) + 433
    frame #8: 0x00007fb6d0683bfc libcoreclr.so`Thread::StackWalkFrames(StackWalkAction (*)(CrawlFrame*, void*), void*, unsigned int, Frame*) + 204
    frame #9: 0x00007fb6d08b32cc libcoreclr.so`ScanStackRoots(Thread*, void (*)(Object**, ScanContext*, unsigned int), ScanContext*) + 364
    frame #10: 0x00007fb6d08b30f5 libcoreclr.so`GCToEEInterface::GcScanRoots(void (*)(Object**, ScanContext*, unsigned int), int, int, ScanContext*) + 261
    frame #11: 0x00007fb6d085b46f libcoreclr.so`SVR::gc_heap::mark_phase(int, int) + 943
    frame #12: 0x00007fb6d085881d libcoreclr.so`SVR::gc_heap::gc1() + 461
    frame #13: 0x00007fb6d084af43 libcoreclr.so`SVR::gc_heap::garbage_collect(int) + 2659
    frame #14: 0x00007fb6d0849fe2 libcoreclr.so`SVR::gc_heap::gc_thread_function() + 674
    frame #15: 0x00007fb6d0849d36 libcoreclr.so`SVR::gc_heap::gc_thread_stub(void*) + 54
    frame #16: 0x00007fb6d08b5698 libcoreclr.so`(anonymous namespace)::CreateNonSuspendableThread(void (*)(void*), void*, char16_t const*)::$_1::__invoke(void*) + 168
    frame #17: 0x00007fb6d09c771d libcoreclr.so`CorUnix::CPalThread::ThreadEntry(void*) + 349
    frame #18: 0x00007fb6d20c26db libpthread.so.0`start_thread + 219
    frame #19: 0x00007fb6d12a8a3f libc.so.6`clone + 63
(lldb) setthread 1
(lldb) t 1
(lldb) * thread #1: tid = 19552, 0x00007fb6d0863a7a libcoreclr.so`SVR::gc_heap::mark_object_simple(unsigned char**, int) + 26, name = 'Raven.Server', stop reason = signal SIGSEGV
    frame #0: 0x00007fb6d0863a7a libcoreclr.so`SVR::gc_heap::mark_object_simple(unsigned char**, int) + 26
libcoreclr.so`SVR::gc_heap::mark_object_simple:
->  0x7fb6d0863a7a <+26>: movq   (%r14), %rcx
    0x7fb6d0863a7d <+29>: movq   %rcx, %rax
    0x7fb6d0863a80 <+32>: orq    $0x1, %rax
    0x7fb6d0863a84 <+36>: movq   %rax, (%r14)

We've run dotnet-dump analyze verifyheap command and it did not detect any corruption:

No heap corruption detected.
<END_COMMAND_OUTPUT>

Calculated the failed method offset 0x00007fb6d0863a7a - 0x7fb6d0794000 = 0xCFA7A and did addr2line:

$ addr2line -e libcoreclr.so -fCi 0xCFA7A
NibbleWriter::WriteEncodedU32(unsigned int)
??:?

Configuration

  • Which version of .NET is the code running on? 3.1.7
  • What OS and version, and what distro if applicable? Ubuntu 18.04
  • What is the architecture (x64, x86, ARM, ARM64)? x64
  • Do you know whether it is specific to that configuration? no

Other information

Metadata

Metadata

Assignees

Labels

area-VM-coreclros-linuxLinux OS (any supported distro)tenet-reliabilityReliability/stability related issue (stress, load problems, etc.)

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions