gcenv.interlocked's Interlocked use full memory barriers even with 8.1 Atomics

All `Interlocked` functions from [gcenv.interlocked.inl](https://github.com/dotnet/runtime/blob/main/src/coreclr/gc/env/gcenv.interlocked.inl) seem to insert full memory barriers, e.g. consider this usage (GC seems to heavily use them under BACKGROUND_GC):
```c++
void GCHeap::SetSuspensionPending(bool fSuspensionPending)
{
    if (fSuspensionPending)
    {
        Interlocked::Increment(&g_fSuspensionPending);
    }
    else
    {
        Interlocked::Decrement(&g_fSuspensionPending);
    }
}
```
Disassembled on Apple M1:
![image](https://user-images.githubusercontent.com/523221/162631504-b35b9810-a6d0-40b2-ab74-8b4dacad7287.png)

Interlocked is implemented like this:
```c++
template <typename T>
__forceinline T Interlocked::Increment(T volatile *addend)
{
#ifdef _MSC_VER
    static_assert(sizeof(long) == sizeof(T), "Size of long must be the same as size of T");
    return _InterlockedIncrement((long*)addend);
#else
    T result = __sync_add_and_fetch(addend, 1);
    ArmInterlockedOperationBarrier();
    return result;
#endif
}
```
see godbolt: https://godbolt.org/z/3jPx3Mz14

From my understanding we don't need full memory barriers in the case when we use 8.1 atomics. Same applies to all `Interlocked` functions in [gcenv.interlocked.inl](https://github.com/dotnet/runtime/blob/main/src/coreclr/gc/env/gcenv.interlocked.inl)
These barriers are needed on 8.0 where C++ compilers lower builtin atomic intrinsics without them for some reason (see https://patchwork.kernel.org/project/linux-arm-kernel/patch/1391516953-14541-1-git-send-email-will.deacon@arm.com/)

JIT does the same, e.g. for C#:
```csharp
void Foo(ref int x) => Interlocked.Increment(ref x);
```
it produces on arm64-8.0:
```asm
        885FFC23          ldaxr   w3, [x1]
        11000462          add     w2, w3, #1
        8800FC22          stlxr   w0, w2, [x1]
        35FFFFA0          cbnz    w0, G_M24917_IG02
        D5033BBF          dmb     ish  ;; <----
```
and this on >=8.1:
```asm
        52800020          mov     w0, #1
        B8E00020          ldaddal w0, w0, [x1]
```

cc @VSadov 

PS: Yes, when we compile CoreCLR for Apple M1 we unintentionally use `-mcpu=apple-m1` and use all the new shiny instructions e.g. arm v8.3's ldapr, 8.1 atomics, etc.. (well, makes sense 🤷‍♂️)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gcenv.interlocked's Interlocked use full memory barriers even with 8.1 Atomics #67824

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gcenv.interlocked's Interlocked use full memory barriers even with 8.1 Atomics #67824

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions