Description
Description
I've observed a number of segfaults in the runtime with the common theme of what appears to be a garbage collection encountering an Object
with a null method table ptr. The original applications and cores are not shareable, but the included example is a much more consistent and minimal reproduction with similar backtraces. The reproduction reliably segfaults within seconds of running across a suite of Linux x64 machines running .NET 8 and 9.
I can no longer reproduce the issue on the .NET 10 preview using this provided example, so it does look like the issue was fixed or avoided in main at some point between .NET 9 and 10. A backport to the older supported .NET versions would be much appreciated if possible.
Reproduction Steps
The following program consistently segfaults on a number of Linux x64 hosts I've tested with .NET 8/9:
using System.Threading;
using System.Collections.Generic;
internal sealed class Program
{
private struct Container
{
public string Name;
public Container(string name) { this.Name = name; }
}
public static void Main(string[] args)
{
var structChurnThread = new Thread(ChurnThread) { Name="StructChurn", IsBackground=true };
structChurnThread.Start();
while (true)
{
var allocation = new byte[100_000];
Thread.Sleep(2);
}
}
private static void ChurnThread()
{
var source = new KeyValuePair<Container, double>[1_000_000];
for (var i = 0; i < source.Length; i++)
{
source[i] = new KeyValuePair<Container, double>(new Container(i.ToString()), i);
}
var destination = new KeyValuePair<Container, double>[source.Length];
while (true)
{
var i = 0;
foreach (var kvp in (IReadOnlyList<KeyValuePair<Container, double>>) source)
{
destination[i++] = kvp;
}
Thread.Sleep(1);
}
}
}
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
</PropertyGroup>
</Project>
Expected behavior
The example is not expected to cause the GC to segfault.
Actual behavior
Shortly after startup, the program will segfault during a GC with a backtrace along the lines of:
(lldb) bt
* thread #9, name = '.NET BGC', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x0)
* frame #0: 0x00007ffff747ae39 libcoreclr.so`WKS::gc_heap::background_promote(Object**, ScanContext*, unsigned int) [inlined] MethodTable::GetFlag(this=0x0000000000000000, flag=enum_flag_HasComponentSize) const at methodtable.h:3511:16
frame #1: 0x00007ffff747ae39 libcoreclr.so`WKS::gc_heap::background_promote(Object**, ScanContext*, unsigned int) [inlined] MethodTable::HasComponentSize(this=0x0000000000000000) const at methodtable.h:1532:16
frame #2: 0x00007ffff747ae39 libcoreclr.so`WKS::gc_heap::background_promote(Object**, ScanContext*, unsigned int) [inlined] WKS::my_get_size(ob=0x00007fbf66d8b380) at gc.cpp:11581:18
frame #3: 0x00007ffff747ae32 libcoreclr.so`WKS::gc_heap::background_promote(Object**, ScanContext*, unsigned int) [inlined] WKS::gc_heap::background_mark_simple(o="") at gc.cpp:27817:24
frame #4: 0x00007ffff747ae0b libcoreclr.so`WKS::gc_heap::background_promote(ppObject=0x00007fbecaccc980, sc=<unavailable>, flags=<unavailable>) at gc.cpp:27904:5
frame #5: 0x00007ffff7215514 libcoreclr.so`HijackFrame::GcScanRoots(this=0x00007fbecaccc8f0, fn=(libcoreclr.so`WKS::gc_heap::background_promote(Object**, ScanContext*, unsigned int) at gc.cpp:27847), sc=0x00007fbeca4cbc28) at frames.cpp:1173:13
frame #6: 0x00007ffff731a5bf libcoreclr.so`GcStackCrawlCallBack(pCF=0x00007fbeca4c9060, pData=0x00007fbeca4cbaf8) at gcenv.ee.common.cpp:297:21
frame #7: 0x00007ffff7293e22 libcoreclr.so`Thread::MakeStackwalkerCallback(this=0x000055555561fb40, pCF=0x00007fbeca4c9060, pCallback=(libcoreclr.so`GcStackCrawlCallBack(CrawlFrame*, void*) at gcenv.ee.common.cpp:200), pData=0x00007fbeca4cbaf8) at stackwalk.cpp:846:27
frame #8: 0x00007ffff72940ad libcoreclr.so`Thread::StackWalkFramesEx(this=0x000055555561fb40, pRD=0x00007fbeca4c9400, pCallback=(libcoreclr.so`GcStackCrawlCallBack(CrawlFrame*, void*) at gcenv.ee.common.cpp:200), pData=0x00007fbeca4cbaf8, flags=34048, pStartFrame=0x0000000000000000) at stackwalk.cpp:926:26
frame #9: 0x00007ffff72943ff libcoreclr.so`Thread::StackWalkFrames(this=0x000055555561fb40, pCallback=(libcoreclr.so`GcStackCrawlCallBack(CrawlFrame*, void*) at gcenv.ee.common.cpp:200), pData=0x00007fbeca4cbaf8, flags=34048, pStartFrame=0x0000000000000000) at stackwalk.cpp:1009:12
frame #10: 0x00007ffff7316fea libcoreclr.so`ScanStackRoots(pThread=0x000055555561fb40, fn=(libcoreclr.so`WKS::gc_heap::background_promote(Object**, ScanContext*, unsigned int) at gc.cpp:27847), sc=0x00007fbeca4cbc28) at gcenv.ee.cpp:182:18
frame #11: 0x00007ffff7316dd6 libcoreclr.so`GCToEEInterface::GcScanRoots(fn=(libcoreclr.so`WKS::gc_heap::background_promote(Object**, ScanContext*, unsigned int) at gc.cpp:27847), condemned=2, max_gen=2, sc=0x00007fbeca4cbc28) at gcenv.ee.cpp:281:13
frame #12: 0x00007ffff746adea libcoreclr.so`WKS::gc_heap::background_mark_phase() at gc.cpp:38204:5
frame #13: 0x00007ffff74695f2 libcoreclr.so`WKS::gc_heap::gc1() at gc.cpp:22320:13
frame #14: 0x00007ffff748eb5c libcoreclr.so`WKS::gc_heap::bgc_thread_function() at gc.cpp:39246:9
frame #15: 0x00007ffff748ea71 libcoreclr.so`WKS::gc_heap::bgc_thread_stub(arg=<unavailable>) at gc.cpp:37174:5 [artificial]
frame #16: 0x00007ffff731a0b4 libcoreclr.so`(anonymous namespace)::CreateSuspendableThread(void (*)(void*), void*, char16_t const*)::$_0::__invoke(void*) [inlined] (anonymous namespace)::CreateSuspendableThread(void (*)(void*), void*, char16_t const*)::$_0::operator()(this=<unavailable>, argument=<unavailable>) const at gcenv.ee.cpp:1440:17
frame #17: 0x00007ffff731a051 libcoreclr.so`(anonymous namespace)::CreateSuspendableThread(void (*)(void*), void*, char16_t const*)::$_0::__invoke(argument=<unavailable>) at gcenv.ee.cpp:1420:27
frame #18: 0x00007ffff76171ae libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x000055555558e2b0) at thread.cpp:1760:16
frame #19: 0x00007ffff788a0ea libc.so.6`start_thread + 794
frame #20: 0x00007ffff790f150 libc.so.6`__clone3 + 48
Or,
(lldb) bt
* thread #9, name = '.NET BGC', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x0)
* frame #0: 0x00007ffff748ede6 libcoreclr.so`WKS::gc_heap::background_drain_mark_list(int) [inlined] MethodTable::GetFlag(this=0x0000000000000000, flag=enum_flag_HasComponentSize) const at methodtable.h:3511:16
frame #1: 0x00007ffff748ede6 libcoreclr.so`WKS::gc_heap::background_drain_mark_list(int) [inlined] MethodTable::HasComponentSize(this=0x0000000000000000) const at methodtable.h:1532:16
frame #2: 0x00007ffff748ede6 libcoreclr.so`WKS::gc_heap::background_drain_mark_list(int) [inlined] WKS::my_get_size(ob=0x00007fbf66d9e582) at gc.cpp:11581:18
frame #3: 0x00007ffff748eddf libcoreclr.so`WKS::gc_heap::background_drain_mark_list(int) [inlined] WKS::gc_heap::background_mark_simple(o="") at gc.cpp:27817:24
frame #4: 0x00007ffff748edb5 libcoreclr.so`WKS::gc_heap::background_drain_mark_list(int) [inlined] WKS::gc_heap::background_mark_object(o="") at gc.cpp:27834:9
frame #5: 0x00007ffff748eda3 libcoreclr.so`WKS::gc_heap::background_drain_mark_list(thread=<unavailable>) at gc.cpp:37196:9
frame #6: 0x00007ffff746ab79 libcoreclr.so`WKS::gc_heap::background_mark_phase() at gc.cpp:38031:5
frame #7: 0x00007ffff74695f2 libcoreclr.so`WKS::gc_heap::gc1() at gc.cpp:22320:13
frame #8: 0x00007ffff748eb5c libcoreclr.so`WKS::gc_heap::bgc_thread_function() at gc.cpp:39246:9
frame #9: 0x00007ffff748ea71 libcoreclr.so`WKS::gc_heap::bgc_thread_stub(arg=<unavailable>) at gc.cpp:37174:5 [artificial]
frame #10: 0x00007ffff731a0b4 libcoreclr.so`(anonymous namespace)::CreateSuspendableThread(void (*)(void*), void*, char16_t const*)::$_0::__invoke(void*) [inlined] (anonymous namespace)::CreateSuspendableThread(void (*)(void*), void*, char16_t const*)::$_0::operator()(this=<unavailable>, argument=<unavailable>) const at gcenv.ee.cpp:1440:17
frame #11: 0x00007ffff731a051 libcoreclr.so`(anonymous namespace)::CreateSuspendableThread(void (*)(void*), void*, char16_t const*)::$_0::__invoke(argument=<unavailable>) at gcenv.ee.cpp:1420:27
frame #12: 0x00007ffff76171ae libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x000055555558e2b0) at thread.cpp:1760:16
frame #13: 0x00007ffff788a0ea libc.so.6`start_thread + 794
frame #14: 0x00007ffff790f150 libc.so.6`__clone3 + 48
Regression?
The included example reliably crashes on Linux x64 .NET 8 and 9 (as well as .NET 6 and 7, for completeness)
Windows x64 builds for the same .NET versions do not appear to be impacted.
Known Workarounds
No response
Configuration
.NET SDKs: 6.0.428, 7.0.410, 8.0.409, 9.0.300
.NET Runtimes: 6.0.36, 7.0.20, 8.0.16, 9.0.5
Runtime Environment:
OS Name: almalinux
OS Version: 9.6
OS Platform: Linux
RID: linux-x64
Other linux configurations (e.g., ARM) haven't been tested. The issue does not appear to be present on Windows.
Other information
Originally it looked like this could be related to MUTLIREG_RETURN on Linux x64 but this hasn't been fully confirmed.