Skip to content

Address sanitizer: Crash/deadlock in inject signal handler on Linux with glibc >= 2.40 #121581

@gbalykov

Description

@gbalykov

Description

CoreCLR crashes in signal handler due to not async-safe code called from inject_activation_handler signal handler.

This starts to happen with glibc at least >= 2.40, which changed logic in thread locals and now GetThread/GetThreadNULLOk called from signal handler can lead to realloc and crash. Sometimes this can lead to deadlock, #121345 is related. This issue happens both with and without asan.

Example of crash backtrace (also from #121345):

0xf51a9340 is located 0 bytes inside of 320-byte region [0xf51a9340,0xf51a9480)
freed by thread T11 here:
    #0 0xf75fa26e in realloc.part.0 (/usr/lib/libasan.so+0x9e26e) (BuildId: 9bba7c7c1d333d26085dc332318addcdddefc51d)
    #1 0xf7b11ae4 in _dl_resize_dtv (/lib/ld-linux.so.3+0x41010ae4) (BuildId: 09e97ca6a7629ff5a7bcdd346a4f7a5508203c59)
    #2 0xf7b12500 in _dl_update_slotinfo (/lib/ld-linux.so.3+0x41011500) (BuildId: 09e97ca6a7629ff5a7bcdd346a4f7a5508203c59)
    #3 0xf7b1265c in update_get_addr (/lib/ld-linux.so.3+0x4101165c) (BuildId: 09e97ca6a7629ff5a7bcdd346a4f7a5508203c59)
    #4 0xf75dd5ea in __tls_get_addr (/usr/lib/libasan.so+0x815ea) (BuildId: 9bba7c7c1d333d26085dc332318addcdddefc51d)
    #5 0xf24062da in CheckActivationSafePoint(unsigned int) (/usr/share/dotnet/shared/Microsoft.NETCore.App/8.0.11/libcoreclr.so+0x1792da) (BuildId: 78a8db7ede0a62b8ff150e5a58e4c5dad06019e3)
    #6 0xf2571cc8 in inject_activation_handler(int, siginfo_t*, void*) (/usr/share/dotnet/shared/Microsoft.NETCore.App/8.0.11/libcoreclr.so+0x2e4cc8) (BuildId: 78a8db7ede0a62b8ff150e5a58e4c5dad06019e3)
    #7 0xf71ace0c  (/lib/libc.so.6+0x41242e0c) (BuildId: 4d66a597c3674cb64087a6587522a00c688b8037)
    #8 0xf7605700 in __sanitizer::BufferedStackTrace::UnwindImpl(unsigned int, unsigned int, void*, bool, unsigned int) (/usr/lib/libasan.so+0xa9700) (BuildId: 9bba7c7c1d333d26085dc332318addcdddefc51d)
    #9 0xf75fa298 in realloc.part.0 (/usr/lib/libasan.so+0x9e298) (BuildId: 9bba7c7c1d333d26085dc332318addcdddefc51d)
    #10 0xf7b11ae4 in _dl_resize_dtv (/lib/ld-linux.so.3+0x41010ae4) (BuildId: 09e97ca6a7629ff5a7bcdd346a4f7a5508203c59)
    #11 0xf7b12500 in _dl_update_slotinfo (/lib/ld-linux.so.3+0x41011500) (BuildId: 09e97ca6a7629ff5a7bcdd346a4f7a5508203c59)
    #12 0xf7b1265c in update_get_addr (/lib/ld-linux.so.3+0x4101165c) (BuildId: 09e97ca6a7629ff5a7bcdd346a4f7a5508203c59)
    #13 0xf75dd5ea in __tls_get_addr (/usr/lib/libasan.so+0x815ea) (BuildId: 9bba7c7c1d333d26085dc332318addcdddefc51d)
    #14 0xf235d62a in ManagedThreadBase_DispatchMiddle(ManagedThreadCallState*)::Cleanup::~Cleanup() (/usr/share/dotnet/shared/Microsoft.NETCore.App/8.0.11/libcoreclr.so+0xd062a) (BuildId: 78a8db7ede0a62b8ff150e5a58e4c5dad06019e3)
    #15 0xf235c926 in ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) (/usr/share/dotnet/shared/Microsoft.NETCore.App/8.0.11/libcoreclr.so+0xcf926) (BuildId: 78a8db7ede0a62b8ff150e5a58e4c5dad06019e3)
    #16 0xf235cb60 in ManagedThreadBase::KickOff(void (*)(void*), void*) (/usr/share/dotnet/shared/Microsoft.NETCore.App/8.0.11/libcoreclr.so+0xcfb60) (BuildId: 78a8db7ede0a62b8ff150e5a58e4c5dad06019e3)
    #17 0xf238a274 in ThreadNative::KickOffThread(void*) (/usr/share/dotnet/shared/Microsoft.NETCore.App/8.0.11/libcoreclr.so+0xfd274) (BuildId: 78a8db7ede0a62b8ff150e5a58e4c5dad06019e3)
    #18 0xf25912d8 in CorUnix::CPalThread::ThreadEntry(void*) (/usr/share/dotnet/shared/Microsoft.NETCore.App/8.0.11/libcoreclr.so+0x3042d8) (BuildId: 78a8db7ede0a62b8ff150e5a58e4c5dad06019e3)
    #19 0xf75a1eda in asan_thread_start(void*) (/usr/lib/libasan.so+0x45eda) (BuildId: 9bba7c7c1d333d26085dc332318addcdddefc51d)
    #20 0xf71f0cb0 in start_thread (/lib/libc.so.6+0x41286cb0) (BuildId: 4d66a597c3674cb64087a6587522a00c688b8037)

Should hijack be disabled for Linux for now as a quick fix (e.g. disabling FEATURE_THREAD_ACTIVATION) until #121345 (comment) is not completed? This seems to affect all Linux platforms with glibc >= 2.40 (e.g. Ubuntu 25.04 and higher).

cc @dotnet/samsung

Reproduction Steps

Some reproduction cases are mentioned in #121345

Expected behavior

No crash/deadlock

Actual behavior

Crash/deadlock

Regression?

Seems to be present in all .net versions (at least starting from .net core 3.1)

Known Workarounds

Disabling FEATURE_THREAD_ACTIVATION?

Configuration

Crash backtrace above for .net 8.0.11 arm32 Tizen, but bug is independent of arch and dotnet version

Other information

No response

Metadata

Metadata

Assignees

Labels

area-VM-coreclrtenet-reliabilityReliability/stability related issue (stress, load problems, etc.)

Type

No type

Projects

Status

No status

Relationships

None yet

Development

No branches or pull requests

Issue actions