Skip to content

Reproducible segfaults during GC on .NET 8/9 Linux x64 #115815

@zengandrew

Description

@zengandrew

Description

I've observed a number of segfaults in the runtime with the common theme of what appears to be a garbage collection encountering an Object with a null method table ptr. The original applications and cores are not shareable, but the included example is a much more consistent and minimal reproduction with similar backtraces. The reproduction reliably segfaults within seconds of running across a suite of Linux x64 machines running .NET 8 and 9.

I can no longer reproduce the issue on the .NET 10 preview using this provided example, so it does look like the issue was fixed or avoided in main at some point between .NET 9 and 10. A backport to the older supported .NET versions would be much appreciated if possible.

Reproduction Steps

The following program consistently segfaults on a number of Linux x64 hosts I've tested with .NET 8/9:

using System.Threading;
using System.Collections.Generic;

internal sealed class Program
{
    private struct Container
    {
        public string Name;
        public Container(string name) { this.Name = name; }
    }

    public static void Main(string[] args)
    {
        var structChurnThread = new Thread(ChurnThread) { Name="StructChurn", IsBackground=true };
        structChurnThread.Start();

        while (true)
        {
            var allocation = new byte[100_000];
            Thread.Sleep(2);
        }
    }

    private static void ChurnThread()
    {
        var source = new KeyValuePair<Container, double>[1_000_000];
        for (var i = 0; i < source.Length; i++)
        {
            source[i] = new KeyValuePair<Container, double>(new Container(i.ToString()), i);
        }

        var destination = new KeyValuePair<Container, double>[source.Length];
        while (true)
        {
            var i = 0;
            foreach (var kvp in (IReadOnlyList<KeyValuePair<Container, double>>) source)
            {
                destination[i++] = kvp;
            }

            Thread.Sleep(1);
        }
    }
}
<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
  </PropertyGroup>
</Project>

Expected behavior

The example is not expected to cause the GC to segfault.

Actual behavior

Shortly after startup, the program will segfault during a GC with a backtrace along the lines of:

(lldb) bt
* thread #9, name = '.NET BGC', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x0)
  * frame #0: 0x00007ffff747ae39 libcoreclr.so`WKS::gc_heap::background_promote(Object**, ScanContext*, unsigned int) [inlined] MethodTable::GetFlag(this=0x0000000000000000, flag=enum_flag_HasComponentSize) const at methodtable.h:3511:16
    frame #1: 0x00007ffff747ae39 libcoreclr.so`WKS::gc_heap::background_promote(Object**, ScanContext*, unsigned int) [inlined] MethodTable::HasComponentSize(this=0x0000000000000000) const at methodtable.h:1532:16
    frame #2: 0x00007ffff747ae39 libcoreclr.so`WKS::gc_heap::background_promote(Object**, ScanContext*, unsigned int) [inlined] WKS::my_get_size(ob=0x00007fbf66d8b380) at gc.cpp:11581:18
    frame #3: 0x00007ffff747ae32 libcoreclr.so`WKS::gc_heap::background_promote(Object**, ScanContext*, unsigned int) [inlined] WKS::gc_heap::background_mark_simple(o="") at gc.cpp:27817:24
    frame #4: 0x00007ffff747ae0b libcoreclr.so`WKS::gc_heap::background_promote(ppObject=0x00007fbecaccc980, sc=<unavailable>, flags=<unavailable>) at gc.cpp:27904:5
    frame #5: 0x00007ffff7215514 libcoreclr.so`HijackFrame::GcScanRoots(this=0x00007fbecaccc8f0, fn=(libcoreclr.so`WKS::gc_heap::background_promote(Object**, ScanContext*, unsigned int) at gc.cpp:27847), sc=0x00007fbeca4cbc28) at frames.cpp:1173:13
    frame #6: 0x00007ffff731a5bf libcoreclr.so`GcStackCrawlCallBack(pCF=0x00007fbeca4c9060, pData=0x00007fbeca4cbaf8) at gcenv.ee.common.cpp:297:21
    frame #7: 0x00007ffff7293e22 libcoreclr.so`Thread::MakeStackwalkerCallback(this=0x000055555561fb40, pCF=0x00007fbeca4c9060, pCallback=(libcoreclr.so`GcStackCrawlCallBack(CrawlFrame*, void*) at gcenv.ee.common.cpp:200), pData=0x00007fbeca4cbaf8) at stackwalk.cpp:846:27
    frame #8: 0x00007ffff72940ad libcoreclr.so`Thread::StackWalkFramesEx(this=0x000055555561fb40, pRD=0x00007fbeca4c9400, pCallback=(libcoreclr.so`GcStackCrawlCallBack(CrawlFrame*, void*) at gcenv.ee.common.cpp:200), pData=0x00007fbeca4cbaf8, flags=34048, pStartFrame=0x0000000000000000) at stackwalk.cpp:926:26
    frame #9: 0x00007ffff72943ff libcoreclr.so`Thread::StackWalkFrames(this=0x000055555561fb40, pCallback=(libcoreclr.so`GcStackCrawlCallBack(CrawlFrame*, void*) at gcenv.ee.common.cpp:200), pData=0x00007fbeca4cbaf8, flags=34048, pStartFrame=0x0000000000000000) at stackwalk.cpp:1009:12
    frame #10: 0x00007ffff7316fea libcoreclr.so`ScanStackRoots(pThread=0x000055555561fb40, fn=(libcoreclr.so`WKS::gc_heap::background_promote(Object**, ScanContext*, unsigned int) at gc.cpp:27847), sc=0x00007fbeca4cbc28) at gcenv.ee.cpp:182:18
    frame #11: 0x00007ffff7316dd6 libcoreclr.so`GCToEEInterface::GcScanRoots(fn=(libcoreclr.so`WKS::gc_heap::background_promote(Object**, ScanContext*, unsigned int) at gc.cpp:27847), condemned=2, max_gen=2, sc=0x00007fbeca4cbc28) at gcenv.ee.cpp:281:13
    frame #12: 0x00007ffff746adea libcoreclr.so`WKS::gc_heap::background_mark_phase() at gc.cpp:38204:5
    frame #13: 0x00007ffff74695f2 libcoreclr.so`WKS::gc_heap::gc1() at gc.cpp:22320:13
    frame #14: 0x00007ffff748eb5c libcoreclr.so`WKS::gc_heap::bgc_thread_function() at gc.cpp:39246:9
    frame #15: 0x00007ffff748ea71 libcoreclr.so`WKS::gc_heap::bgc_thread_stub(arg=<unavailable>) at gc.cpp:37174:5 [artificial]
    frame #16: 0x00007ffff731a0b4 libcoreclr.so`(anonymous namespace)::CreateSuspendableThread(void (*)(void*), void*, char16_t const*)::$_0::__invoke(void*) [inlined] (anonymous namespace)::CreateSuspendableThread(void (*)(void*), void*, char16_t const*)::$_0::operator()(this=<unavailable>, argument=<unavailable>) const at gcenv.ee.cpp:1440:17
    frame #17: 0x00007ffff731a051 libcoreclr.so`(anonymous namespace)::CreateSuspendableThread(void (*)(void*), void*, char16_t const*)::$_0::__invoke(argument=<unavailable>) at gcenv.ee.cpp:1420:27
    frame #18: 0x00007ffff76171ae libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x000055555558e2b0) at thread.cpp:1760:16
    frame #19: 0x00007ffff788a0ea libc.so.6`start_thread + 794
    frame #20: 0x00007ffff790f150 libc.so.6`__clone3 + 48

Or,

(lldb) bt
* thread #9, name = '.NET BGC', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x0)
  * frame #0: 0x00007ffff748ede6 libcoreclr.so`WKS::gc_heap::background_drain_mark_list(int) [inlined] MethodTable::GetFlag(this=0x0000000000000000, flag=enum_flag_HasComponentSize) const at methodtable.h:3511:16
    frame #1: 0x00007ffff748ede6 libcoreclr.so`WKS::gc_heap::background_drain_mark_list(int) [inlined] MethodTable::HasComponentSize(this=0x0000000000000000) const at methodtable.h:1532:16
    frame #2: 0x00007ffff748ede6 libcoreclr.so`WKS::gc_heap::background_drain_mark_list(int) [inlined] WKS::my_get_size(ob=0x00007fbf66d9e582) at gc.cpp:11581:18
    frame #3: 0x00007ffff748eddf libcoreclr.so`WKS::gc_heap::background_drain_mark_list(int) [inlined] WKS::gc_heap::background_mark_simple(o="") at gc.cpp:27817:24
    frame #4: 0x00007ffff748edb5 libcoreclr.so`WKS::gc_heap::background_drain_mark_list(int) [inlined] WKS::gc_heap::background_mark_object(o="") at gc.cpp:27834:9
    frame #5: 0x00007ffff748eda3 libcoreclr.so`WKS::gc_heap::background_drain_mark_list(thread=<unavailable>) at gc.cpp:37196:9
    frame #6: 0x00007ffff746ab79 libcoreclr.so`WKS::gc_heap::background_mark_phase() at gc.cpp:38031:5
    frame #7: 0x00007ffff74695f2 libcoreclr.so`WKS::gc_heap::gc1() at gc.cpp:22320:13
    frame #8: 0x00007ffff748eb5c libcoreclr.so`WKS::gc_heap::bgc_thread_function() at gc.cpp:39246:9
    frame #9: 0x00007ffff748ea71 libcoreclr.so`WKS::gc_heap::bgc_thread_stub(arg=<unavailable>) at gc.cpp:37174:5 [artificial]
    frame #10: 0x00007ffff731a0b4 libcoreclr.so`(anonymous namespace)::CreateSuspendableThread(void (*)(void*), void*, char16_t const*)::$_0::__invoke(void*) [inlined] (anonymous namespace)::CreateSuspendableThread(void (*)(void*), void*, char16_t const*)::$_0::operator()(this=<unavailable>, argument=<unavailable>) const at gcenv.ee.cpp:1440:17
    frame #11: 0x00007ffff731a051 libcoreclr.so`(anonymous namespace)::CreateSuspendableThread(void (*)(void*), void*, char16_t const*)::$_0::__invoke(argument=<unavailable>) at gcenv.ee.cpp:1420:27
    frame #12: 0x00007ffff76171ae libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x000055555558e2b0) at thread.cpp:1760:16
    frame #13: 0x00007ffff788a0ea libc.so.6`start_thread + 794
    frame #14: 0x00007ffff790f150 libc.so.6`__clone3 + 48

Regression?

The included example reliably crashes on Linux x64 .NET 8 and 9 (as well as .NET 6 and 7, for completeness)

Windows x64 builds for the same .NET versions do not appear to be impacted.

Known Workarounds

No response

Configuration

.NET SDKs: 6.0.428, 7.0.410, 8.0.409, 9.0.300
.NET Runtimes: 6.0.36, 7.0.20, 8.0.16, 9.0.5

Runtime Environment:
 OS Name:     almalinux
 OS Version:  9.6
 OS Platform: Linux
 RID:         linux-x64

Other linux configurations (e.g., ARM) haven't been tested. The issue does not appear to be present on Windows.

Other information

Originally it looked like this could be related to MUTLIREG_RETURN on Linux x64 but this hasn't been fully confirmed.

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions