Skip to content

[Mono] test failure: SIGABRT in nativeruntimeeventsource.cs #68032

@josalem

Description

@josalem

Discovered as part of #67150

Copy/paste of details from that comment chain:

I can repro the Mono failure locally. It looks like there's a SIGABRT when trying to get a wait handle from the GC:

=================================================================
Managed Stacktrace:
=================================================================
  at <unknown> <0xffffffff>
  at System.Runtime.InteropServices.GCHandle:InternalGet <0x00079>
  at System.Runtime.InteropServices.GCHandle:get_Target <0x00028>
  at HandleManager:FromHandle <0x00077>
  at System.Threading.WaitSubsystem:SetEvent <0x0001f>
  at System.Threading.EventWaitHandle:Set <0x0005a>
  at System.Diagnostics.Tracing.EventPipeEventDispatcher:StopDispatchTask <0x00042>
  at System.Diagnostics.Tracing.EventPipeEventDispatcher:CommitDispatchConfiguration <0x0008b>
  at System.Diagnostics.Tracing.EventPipeEventDispatcher:SendCommand <0x000dc>
  at System.Diagnostics.Tracing.EventListener:EnableEvents <0x000c2>
  at System.Diagnostics.Tracing.EventListener:EnableEvents <0x00023>
  at Tracing.Tests.SimpleEventListener:OnEventSourceCreated <0x00061>
  at System.Diagnostics.Tracing.EventSource:AddListener <0x00194>
  at <>c:<.ctor>b__10_0 <0x00044>
  at System.Diagnostics.Tracing.EventListener:CallBackForExistingEventSources <0x00268>
  at System.Diagnostics.Tracing.EventListener:.ctor <0x000b3>
  at Tracing.Tests.SimpleEventListener:.ctor <0x0001e>
  at Tracing.Tests.NativeRuntimeEventSourceTest:Main <0x000ab>
  at <Module>:runtime_invoke_int_object <0x00069>
=================================================================
Details

=================================================================
External Debugger Dump:
=================================================================
[New LWP 20477]
[New LWP 20478]
[New LWP 20479]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
warning: the debug information found in "/datadisks/disk1/work/B9570A9B/p/libcoreclr.so.dbg" does not match "/datadisks/disk1/work/B9570A9B/p/libcoreclr.so" (CRC mismatch).

0x00007f27c9e3a32a in __waitpid (pid=20595, stat_loc=0x7fff6d150ec0, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
30../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
  Id   Target Id         Frame 
* 1    Thread 0x7f27ca461740 (LWP 20476) "corerun" 0x00007f27c9e3a32a in __waitpid (pid=20595, stat_loc=0x7fff6d150ec0, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
  2    Thread 0x7f27c63ff700 (LWP 20477) "SGen worker" 0x00007f27c9e35ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f27c90c16b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  3    Thread 0x7f27c4871700 (LWP 20478) "corerun" 0x00007f27c920cbb9 in __GI___poll (fds=0x7f27c0002d50, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
  4    Thread 0x7f27c4670700 (LWP 20479) "Finalizer" 0x00007f27c9e387c6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x7f27c90b2560) at ../sysdeps/unix/sysv/linux/futex-internal.h:205

Thread 4 (Thread 0x7f27c4670700 (LWP 20479)):
#0  0x00007f27c9e387c6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x7f27c90b2560) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x7f27c90b2560, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f27c9e388b8 in __new_sem_wait_slow (sem=0x7f27c90b2560, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f27c78e64b6 in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#4  0x00007f27c78bf6ea in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#5  0x00007f27c78bf579 in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#6  0x00007f27c9e2f6db in start_thread (arg=0x7f27c4670700) at pthread_create.c:463
#7  0x00007f27c921961f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f27c4871700 (LWP 20478)):
#0  0x00007f27c920cbb9 in __GI___poll (fds=0x7f27c0002d50, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f27c7b4811a in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#2  0x00007f27c7b45605 in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#3  0x00007f27c7b43e69 in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#4  0x00007f27c7b47531 in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#5  0x00007f27c9e2f6db in start_thread (arg=0x7f27c4871700) at pthread_create.c:463
#6  0x00007f27c921961f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f27c63ff700 (LWP 20477)):
#0  0x00007f27c9e35ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f27c90c16b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7f27c90c1668, cond=0x7f27c90c1690) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x7f27c90c1690, mutex=0x7f27c90c1668) at pthread_cond_wait.c:655
#3  0x00007f27c7969143 in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#4  0x00007f27c9e2f6db in start_thread (arg=0x7f27c63ff700) at pthread_create.c:463
#5  0x00007f27c921961f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f27ca461740 (LWP 20476)):
#0  0x00007f27c9e3a32a in __waitpid (pid=20595, stat_loc=0x7fff6d150ec0, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x00007f27c7a818b7 in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#2  0x00007f27c7a2372e in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#3  0x00007f27c7a81048 in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#4  <signal handler called>
#5  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#6  0x00007f27c91387f1 in __GI_abort () at abort.c:79
#7  0x00007f27c7afb045 in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#8  0x00007f27c79115a3 in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#9  0x00007f27c7afb405 in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#10 0x00007f27c7afb53e in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#11 0x00007f27c7938c86 in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#12 0x00007f27c78c2fda in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#13 0x00007f27c78756d1 in ?? () from /datadisks/disk1/work/B9570A9B/p/libcoreclr.so
#14 0x00000000418eacea in ?? ()
#15 0x000056390eb8c0c8 in ?? ()
#16 0x00007f27c53ff000 in ?? ()
#17 0x0000000000000000 in ?? ()
/datadisks/disk1/work/B9570A9B/w/AA38092D/e/tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource.sh: line 395: 20476 Aborted                 (core dumped) $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"

Return code:      1
Raw output file:      /datadisks/disk1/work/B9570A9B/w/AA38092D/uploads/Reports/tracing.runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource.output.txt
Raw output:
BEGIN EXECUTION
/datadisks/disk1/work/B9570A9B/p/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false nativeruntimeeventsource.dll ''
Why are we accessing an entry that is not allocated

=================================================================
Native Crash Reporting
=================================================================
Got a SIGABRT while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================

=================================================================
Native stacktrace:
=================================================================
0x7f27c7a81772 - Unknown
0x7f27c7a2372e - Unknown
0x7f27c7a81048 - Unknown
0x7f27c9e3a980 - Unknown
0x7f27c9136e87 - Unknown
0x7f27c91387f1 - Unknown
0x7f27c7afb045 - Unknown
0x7f27c79115a3 - Unknown
0x7f27c7afb405 - Unknown
0x7f27c7afb53e - Unknown
0x7f27c7938c86 - Unknown
0x7f27c78c2fda - Unknown
0x7f27c78756d1 - Unknown
0x418eacea - Unknown

=================================================================
Basic Fault Address Reporting
=================================================================
Memory around native instruction pointer (0x7f27c9136e87):0x7f27c9136e77  d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05  .L..............
0x7f27c9136e87  48 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00  H..$....dH3.%(..
0x7f27c9136e97  00 44 89 c0 75 1f 48 81 c4 18 01 00 00 c3 0f 1f  .D..u.H.........
0x7f27c9136ea7  00 48 8b 15 b9 bf 3a 00 f7 d8 41 b8 ff ff ff ff  .H....:...A.....

=================================================================
Managed Stacktrace:
=================================================================
  at <unknown> <0xffffffff>
  at System.Runtime.InteropServices.GCHandle:InternalGet <0x00079>
  at System.Runtime.InteropServices.GCHandle:get_Target <0x00028>
  at HandleManager:FromHandle <0x00077>
  at System.Threading.WaitSubsystem:SetEvent <0x0001f>
  at System.Threading.EventWaitHandle:Set <0x0005a>
  at System.Diagnostics.Tracing.EventPipeEventDispatcher:StopDispatchTask <0x00042>
  at System.Diagnostics.Tracing.EventPipeEventDispatcher:CommitDispatchConfiguration <0x0008b>
  at System.Diagnostics.Tracing.EventPipeEventDispatcher:SendCommand <0x000dc>
  at System.Diagnostics.Tracing.EventListener:EnableEvents <0x000c2>
  at System.Diagnostics.Tracing.EventListener:EnableEvents <0x00023>
  at Tracing.Tests.SimpleEventListener:OnEventSourceCreated <0x00061>
  at System.Diagnostics.Tracing.EventSource:AddListener <0x00194>
  at <>c:<.ctor>b__10_0 <0x00044>
  at System.Diagnostics.Tracing.EventListener:CallBackForExistingEventSources <0x00268>
  at System.Diagnostics.Tracing.EventListener:.ctor <0x000b3>
  at Tracing.Tests.SimpleEventListener:.ctor <0x0001e>
  at Tracing.Tests.NativeRuntimeEventSourceTest:Main <0x000ab>
  at <Module>:runtime_invoke_int_object <0x00069>
=================================================================
Expected: 100
Actual: 134
END EXECUTION - FAILED
Test Harness Exitcode is : 1
To run the test:
> set CORE_ROOT=/datadisks/disk1/work/B9570A9B/p
> /datadisks/disk1/work/B9570A9B/w/AA38092D/e/tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource.sh

Having some difficulties running the test under a debugger, but I was able to glean that we are hitting this assert in sgen-array-list.h:

static inline volatile gpointer *
sgen_array_list_get_slot (SgenArrayList *array, guint32 index)
{
	guint32 bucket, offset;

	SGEN_ASSERT (0, index < array->capacity, "Why are we accessing an entry that is not allocated"); // <----

	sgen_array_list_bucketize (index, &bucket, &offset);
	return &(array->entries [bucket] [offset]);
}

Which we seem to be getting to from this chain:

sgen_gchandle_get_target
mono_gchandle_get_target_handle
ves_icall_System_GCHandle_InternalGet_raw
...

Looking at sgen_gchandle_get_target:

GCObject*
sgen_gchandle_get_target (guint32 gchandle)
{
	guint index = MONO_GC_HANDLE_SLOT (gchandle);
	GCHandleType type = MONO_GC_HANDLE_TYPE (gchandle);
	HandleData *handles = gc_handles_for_type (type);
	/* Invalid handles are possible; accessing one should produce NULL. (#34276) */
	if (!handles)
		return NULL;
	return link_get (sgen_array_list_get_slot (&handles->entries_array, index), MONO_GC_HANDLE_TYPE_IS_WEAK (type));
}

It looks like we are getting to the return statement and the value of index is greater than the size of &handles->entries_array.

Hopefully this becomes painfully obvious once I can get it running under LLDB.


Got it under a debugger finally. Turns out you can't use LLDB in Codespaces -_-.

Anyway, I managed to peak at the frames leading up to the assert on Mono. Looking at the data shows that we're getting an invalid index into the HandleData->entries_array.entries array when the gchandle is masked:

(lldb) f 11
frame #11: 0x000000010312c0f0 libcoreclr.dylib`sgen_gchandle_get_target(gchandle=2151731204) at sgen-gchandles.c:305:19
   302 		/* Invalid handles are possible; accessing one should produce NULL. (#34276) */
   303 		if (!handles)
   304 			return NULL;
-> 305 		return link_get (sgen_array_list_get_slot (&handles->entries_array, index), MONO_GC_HANDLE_TYPE_IS_WEAK (type));
   306 	}
   307
   308 	void
(lldb) fr v
(guint32) gchandle = 2151731204
(guint) index = 268966400 // <---------
(GCHandleType) type = HANDLE_PINNED
(HandleData *) handles = 0x0000000103532c48
(lldb) p handles
(HandleData *) $1 = 0x0000000103532c48
(lldb) p *handles
(HandleData) $2 = {
  entries_array = {
    entries = {
      [0] = 0x0000600003e04300
      [1] = 0x0000000000000000
      [2] = 0x0000000000000000
      [3] = 0x0000000000000000
      ...
    }
    capacity = 32
    slot_hint = 0
    next_slot = 1
    bucket_alloc_callback = 0x000000010312d2a0 (libcoreclr.dylib`bucket_alloc_callback at sgen-gchandles.c:106)
    is_slot_set_func = 0x000000010312d190 (libcoreclr.dylib`is_slot_set at sgen-gchandles.c:88)
    set_slot_func = 0x000000010312d1d0 (libcoreclr.dylib`try_occupy_slot at sgen-gchandles.c:98)
    mem_type = -1
  }
  type = '\x03'
}

The code is supposed to be looking up a SafeWaitHandle for the wait handle on the buffer manager of the session. Poking at the session shows that the underlying handle matches (0x000000010081a008):

(lldb) p *_ep_sessions[0]->buffer_manager
(EventPipeBufferManager) $10 = {
  thread_session_state_list = {
    list = 0x0000000000000000
  }
  sequence_points = {
    list = 0x0000000000000000
  }
  rt_wait_event = (event = 0x000000010081a008) // <---------------
  rt_lock = {
    lock = 0x0000600001718d80
    owning_thread_id = 0x0000000000000000
  }
  ...
}
    frame #10: 0x000000010312c256 libcoreclr.dylib`sgen_array_list_get_slot(array=0x0000000103532c48, index=268966400) at sgen-array-list.h:112:2
  * frame #11: 0x000000010312c0f0 libcoreclr.dylib`sgen_gchandle_get_target(gchandle=2151731204) at sgen-gchandles.c:305:19
    frame #12: 0x00000001030c723d libcoreclr.dylib`mono_gchandle_get_target_internal(gchandle=0x000000010081a008) at sgen-mono.c:2782:9
    frame #13: 0x000000010306dad5 libcoreclr.dylib`mono_gchandle_get_target_handle(gchandle=0x000000010081a008) at handle.c:414:9
    frame #14: 0x00000001030acea9 libcoreclr.dylib`ves_icall_System_GCHandle_InternalGet(handle=0x000000010081a008, error=0x00007ff7bfefde78) at gc.c:676:9
    frame #15: 0x0000000102ffcada libcoreclr.dylib`ves_icall_System_GCHandle_InternalGet_raw(a0=0x000000010081a008) at icall-def.h:430:1  <------

The faulting managed code from EventPipeEventDispatcher.cs:

        private void StopDispatchTask()
        {
            Debug.Assert(Monitor.IsEntered(m_dispatchControlLock));

            if (m_dispatchTask != null)
            {
                m_stopDispatchTask = true;
                Debug.Assert(!m_dispatchTaskWaitHandle.SafeWaitHandle.IsInvalid);
                EventWaitHandle.Set(m_dispatchTaskWaitHandle.SafeWaitHandle); // <-------
                m_dispatchTask.Wait();
                m_dispatchTask = null;
            }
        }

We don't hit the Debug.Assert before the call to EventWaitHandle.Set().

@lambdageek or @lateralusX do you have any intuition about what might be going on here? I went digging into the wait handle alloc logic in Mono, but it might take me a while to pinpoint what's going on.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions