Description
All Interlocked
functions from gcenv.interlocked.inl seem to insert full memory barriers, e.g. consider this usage (GC seems to heavily use them under BACKGROUND_GC):
void GCHeap::SetSuspensionPending(bool fSuspensionPending)
{
if (fSuspensionPending)
{
Interlocked::Increment(&g_fSuspensionPending);
}
else
{
Interlocked::Decrement(&g_fSuspensionPending);
}
}
Interlocked is implemented like this:
template <typename T>
__forceinline T Interlocked::Increment(T volatile *addend)
{
#ifdef _MSC_VER
static_assert(sizeof(long) == sizeof(T), "Size of long must be the same as size of T");
return _InterlockedIncrement((long*)addend);
#else
T result = __sync_add_and_fetch(addend, 1);
ArmInterlockedOperationBarrier();
return result;
#endif
}
see godbolt: https://godbolt.org/z/3jPx3Mz14
From my understanding we don't need full memory barriers in the case when we use 8.1 atomics. Same applies to all Interlocked
functions in gcenv.interlocked.inl
These barriers are needed on 8.0 where C++ compilers lower builtin atomic intrinsics without them for some reason (see https://patchwork.kernel.org/project/linux-arm-kernel/patch/1391516953-14541-1-git-send-email-will.deacon@arm.com/)
JIT does the same, e.g. for C#:
void Foo(ref int x) => Interlocked.Increment(ref x);
it produces on arm64-8.0:
885FFC23 ldaxr w3, [x1]
11000462 add w2, w3, #1
8800FC22 stlxr w0, w2, [x1]
35FFFFA0 cbnz w0, G_M24917_IG02
D5033BBF dmb ish ;; <----
and this on >=8.1:
52800020 mov w0, #1
B8E00020 ldaddal w0, w0, [x1]
cc @VSadov
PS: Yes, when we compile CoreCLR for Apple M1 we unintentionally use -mcpu=apple-m1
and use all the new shiny instructions e.g. arm v8.3's ldapr, 8.1 atomics, etc.. (well, makes sense 🤷♂️)