Skip to content

[LoongArch64] Add atomic instruction implementation for LoongArch64#129683

Open
lawn123 wants to merge 10 commits into
dotnet:mainfrom
lawn123:LoongArch64_Atomic
Open

[LoongArch64] Add atomic instruction implementation for LoongArch64#129683
lawn123 wants to merge 10 commits into
dotnet:mainfrom
lawn123:LoongArch64_Atomic

Conversation

@lawn123

@lawn123 lawn123 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Add atomic instruction for LoongArch64. Supports atomic instructions on ISA1.0 and ISA1.1
Implementation of #122745.

@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 22, 2026
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Jun 22, 2026
@lawn123

lawn123 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

Hi @shushanhf @LuckyXu-HF @jakobbotsch @jkotas could you please review this PR? Thanks.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int Exchange(ref int location1, int value)
{
#if TARGET_X86 || TARGET_AMD64 || TARGET_ARM64 || TARGET_RISCV64

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and all other cases) should explicitly list supported architectures. Other new architectures (for example ppc64le) are not expanding the intrinsics by default.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and all other cases) should explicitly list supported architectures. Other new architectures (for example ppc64le) are not expanding the intrinsics by default.

This is for architectures where we want to always use the intrinsic impl and not fallback to runtime impl.

Comment thread src/coreclr/inc/clrconfigvalues.h Outdated
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableRiscV64Zbs, W("EnableRiscV64Zbs"), 1, "Allows RiscV64 Zbs hardware intrinsics to be disabled")
#elif defined(TARGET_LOONGARCH64)
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableLoongArch64LAM_BH, W("EnableLoongArch64LAM_BH"), 0, "Allows LoongArch64 LAM_BH+ hardware intrinsics to be disabled")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableLoongArch64LAM_CAS, W("EnableLoongArch64LAM_CAS"), 0, "Allows LoongArch64 LAM_BH+ hardware intrinsics to be disabled")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here should be LAMCAS, right? Not LAM_BH+ a second time?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will make the changes later.

Comment thread src/coreclr/jit/codegenloongarch64.cpp Outdated
BasicBlock* labelRetry = genCreateTempLabel();
BasicBlock* fail = genCreateTempLabel();
unsigned size = 0;
emit->emitIns_R_R_I(INS_addi_w, dataSize, tmpReg, REG_R0, -4);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those explicit small type emulation fallbacks noticeably better than the managed one when it comes to performance?

@lawn123 lawn123 Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I conducted a performance comparison test on CompareExchange(ref byte location1, byte value, byte compare) using performance. Artifacts_Atomic is based on simulation fallbacks, while Artifacts_noAtomic is built on a managed implementation. CompareExchange_byte is when the comparison results are the same, CompareExchange_byte_nomatch is different.The results are as follows:

| Method                       | Job        | Toolchain                                                                           | Mean     | Error    | StdDev   | Median   | Min      | Max      | Ratio | Allocated | Alloc Ratio |
|----------------------------- |----------- |------------------------------------------------------------------------------------ |---------:|---------:|---------:|---------:|---------:|---------:|------:|----------:|------------:|
| CompareExchange_byte         | Job-AZJEDS | /artifacts_Atomic/tests/coreclr/linux.loongarch64.Release/Tests/Core_Root/corerun   | 22.46 ns | 0.005 ns | 0.004 ns | 22.45 ns | 22.45 ns | 22.46 ns |  1.00 |         - |          NA |
| CompareExchange_byte         | Job-NEIHQV | /artifacts_noAtomic/tests/coreclr/linux.loongarch64.Release/Tests/Core_Root/corerun | 27.43 ns | 0.004 ns | 0.004 ns | 27.43 ns | 27.42 ns | 27.44 ns |  1.22 |         - |          NA |
|                              |            |                                                                                     |          |          |          |          |          |          |       |           |             |
| CompareExchange_byte_nomatch | Job-AZJEDS | /artifacts_Atomic/tests/coreclr/linux.loongarch64.Release/Tests/Core_Root/corerun   | 22.82 ns | 0.005 ns | 0.004 ns | 22.82 ns | 22.82 ns | 22.83 ns |  1.00 |         - |          NA |
| CompareExchange_byte_nomatch | Job-NEIHQV | /artifacts_noAtomic/tests/coreclr/linux.loongarch64.Release/Tests/Core_Root/corerun | 28.16 ns | 0.022 ns | 0.020 ns | 28.16 ns | 28.14 ns | 28.20 ns |  1.23 |         - |          NA |

The corresponding machine instruction sequence is as follows:

managed implementation machine instruction sequence
3ae8e0:  addi.d  $sp, $sp, -80
3ae8e4:  st.d    $fp, $sp, 0
3ae8e8:  st.d    $ra, $sp, 8
3ae8ec:  st.d    $s0, $sp, 16
3ae8f0:  st.d    $s1, $sp, 24
3ae8f4:  st.d    $s2, $sp, 32
3ae8f8:  st.d    $s3, $sp, 40
3ae8fc:  st.d    $s4, $sp, 48
3ae900:  st.d    $s5, $sp, 56
3ae904:  st.d    $s6, $sp, 64
3ae908:  st.d    $s7, $sp, 72
3ae90c:  ori     $fp, $sp, 0
3ae910:  ori     $t8, $a0, 0
         A0 is live
3ae914:  andi    $t8, $t8, 3
3ae918:  sub.d   $s0, $a0, $t8
3ae91c:  slli.w  $a0, $t8, 3
3ae920:  andi    $s1, $a0, 31
3ae924:  addi.w  $a0, $zero, 255
3ae928:  sll.w   $a0, $a0, $s1
         A0 is dead
3ae92c:  nor     $s2, $a0, $zero
         A0 is live
3ae930:  bstrpick.d $a0, $a1, 7, 0
         S0 is dead
3ae934:  sll.w   $s3, $a0, $s1
3ae938:  bstrpick.d $a2, $a2, 7, 0
3ae93c:  sll.w   $s4, $a2, $s1
3ae940:  ld.w    $s5, $s0, 0
3ae944:  and     $s6, $s5, $s2
3ae948:  slli.w  $s6, $s6, 0
3ae94c:  or      $a2, $s6, $s4
3ae950:  slli.w  $a2, $a2, 0
3ae954:  or      $a1, $s6, $s3
3ae958:  slli.w  $a1, $a1, 0
3ae95c:  b       0x3ae97c
3ae960:  and     $s6, $s7, $s2
3ae964:  slli.w  $s6, $s6, 0
3ae968:  or      $a2, $s6, $s4
3ae96c:  slli.w  $a2, $a2, 0
3ae970:  or      $a1, $s6, $s3
3ae974:  slli.w  $a1, $a1, 0
3ae978:  slli.w  $s5, $s7, 0
3ae97c:  ori     $a0, $s0, 0
3ae980:  pcalau12i $t8, 3336
3ae984:  addi.d  $t8, $t8, 1128
3ae988:  ld.d    $a3, $t8, 0
3ae98c:  jirl    $ra, $a3, 0 // int System.Threading.Interlocked.CompareExchange32(ref int, int, int) (METHOD_ENTRY_DEF_TOKEN)
3ae990:  slli.w  $s7, $a0, 0
3ae994:  slli.w  $ra, $s5, 0
3ae998:  slli.w  $r21, $s7, 0
3ae99c:  bne     $r21, $ra, 0x3ae960
3ae9a0:  srl.w   $a0, $a0, $s1
3ae9a4:  bstrpick.d $a0, $a0, 7, 0
3ae9a8:  ld.d    $s7, $sp, 72
3ae9ac:  ld.d    $s6, $sp, 64
3ae9b0:  ld.d    $s5, $sp, 56
3ae9b4:  ld.d    $s4, $sp, 48
3ae9b8:  ld.d    $s3, $sp, 40
3ae9bc:  ld.d    $s2, $sp, 32
3ae9c0:  ld.d    $s1, $sp, 24
3ae9c4:  ld.d    $s0, $sp, 16
3ae9c8:  ld.d    $ra, $sp, 8
3ae9cc:  ld.d    $fp, $sp, 0
3ae9d0:  addi.d  $sp, $sp, 80
3ae9d4:  ret
simulation fallbacks machine instruction sequence
3c7270: addi.d  $sp, $sp, -16
3c7274: st.d    $fp, $sp, 0
3c7278: st.d    $ra, $sp, 8
3c727c: ori     $fp, $sp, 0
3c7280: bstrpick.d $a1, $a1, 7, 0
3c7284: bstrpick.d $a2, $a2, 7, 0
3c7288: addi.w  $a7, $zero, -4
3c728c: and     $a6, $a0, $a7
3c7290: andi    $a7, $a0, 3
3c7294: slli.d  $a7, $a7, 3
3c7298: addi.w  $a3, $zero, 255
3c729c: sll.w   $a4, $a3, $a7
3c72a0: nor     $a5, $zero, $a4
3c72a4: bstrpick.w $a3, $a2, 7, 0
3c72a8: sll.w   $a2, $a3, $a7
3c72ac: bstrpick.w $a3, $a1, 7, 0
3c72b0: sll.w   $a7, $a3, $a7
3c72b4: dbar    0
3c72b8: ll.w    $a3, $a6, 0
3c72bc: and     $t0, $a3, $a4
3c72c0: bne     $t0, $a2, 0x3c72d4
3c72c4: and     $a3, $a3, $a5
3c72c8: or      $a3, $a3, $a7
3c72cc: sc.w    $a3, $a6, 0
3c72d0: beq     $a3, $zero, 0x3c72b8
3c72d4: dbar    0
3c72d8: andi    $a7, $a0, 3
3c72dc: slli.d  $a7, $a7, 3
3c72e0: srl.w   $t0, $t0, $a7
3c72e4: ext.w.b $t0, $t0
3c72e8: bstrpick.d $t0, $t0, 7, 0
3c72ec: slli.w  $a0, $t0, 0
3c72f0: ld.d    $ra, $sp, 8
3c72f4: ld.d    $fp, $sp, 0
3c72f8: addi.d  $sp, $sp, 16 
3c72fc: ret
So I think using simulation fallbacks has some improvement in explicit small type.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3ae98c: jirl $ra, $a3, 0 // int System.Threading.Interlocked.CompareExchange32(ref int, int, int) (METHOD_ENTRY_DEF_TOKEN)

It seems that the CompareExchange isn't expanded properly in the managed fallback, you should probably fix that and benchmark again.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the CompareExchange isn't expanded properly in the managed fallback, you should probably fix that and benchmark again.

For LA, atomic instructions related to CompareExchange are not supported on v1.0, so only internal CompareExchange32 can be called here.

117         public static int CompareExchange(ref int location1, int value, int comparand)
118         {
119 #if TARGET_X86 || TARGET_AMD64 || TARGET_ARM64 || TARGET_RISCV64
120             return CompareExchange(ref location1, value, comparand); // Must expand intrinsic
121 #else
122             if (Unsafe.IsNullRef(ref location1))
123                 ThrowHelper.ThrowNullReferenceException();
124             return CompareExchange32(ref location1, value, comparand);
125 #endif
126         }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the CompareExchange isn't expanded properly in the managed fallback, you should probably fix that and benchmark again.

For LA, atomic instructions related to CompareExchange are not supported on v1.0, so only internal CompareExchange32 can be called here.

117         public static int CompareExchange(ref int location1, int value, int comparand)
118         {
119 #if TARGET_X86 || TARGET_AMD64 || TARGET_ARM64 || TARGET_RISCV64
120             return CompareExchange(ref location1, value, comparand); // Must expand intrinsic
121 #else
122             if (Unsafe.IsNullRef(ref location1))
123                 ThrowHelper.ThrowNullReferenceException();
124             return CompareExchange32(ref location1, value, comparand);
125 #endif
126         }

@jkotas Does the VM not require lock free cmpxchg for correctness?

@jkotas jkotas Jun 23, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For LA, atomic instructions related to CompareExchange are not supported on v1.0, so only internal CompareExchange32 can be called here.

So why is CompareExchange32 not expanded using ll.w/sc.w in your managed implementation machine instruction sequence?

Does the VM not require lock free cmpxchg for correctness?

Yes. I do not see any problems. LA 1.0 supports 32-bit cmpxchg using ll.w/sc.w.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why is CompareExchange32 not expanded using ll.w/sc.w in your managed implementation machine instruction sequence?

This is an issue I encountered during testing, where I completely removed the code from the PR for comparison.Should maintain the LA support added in CoreCLR.cs

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I expect that the software fallback for byte is going to be about as good as the manual expansion in the JIT once you do that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, after adding LA support in CoreCLR.cs, the test results were almost consistent, and the test results are as follows:

| Method                       | Job        | Toolchain                                                                           | Mean     | Error    | StdDev   | Median   | Min      | Max      | Ratio | Allocated | Alloc Ratio |
|----------------------------- |----------- |------------------------------------------------------------------------------------ |---------:|---------:|---------:|---------:|---------:|---------:|------:|----------:|------------:|
| CompareExchange_byte         | Job-AZJEDS | /artifacts_Atomic/tests/coreclr/linux.loongarch64.Release/Tests/Core_Root/corerun   | 22.44 ns | 0.003 ns | 0.002 ns | 22.44 ns | 22.44 ns | 22.45 ns |  1.00 |         - |          NA |
| CompareExchange_byte         | Job-NEIHQV | /artifacts_noAtomic/tests/coreclr/linux.loongarch64.Release/Tests/Core_Root/corerun | 22.44 ns | 0.002 ns | 0.002 ns | 22.44 ns | 22.44 ns | 22.45 ns |  1.00 |         - |          NA |
|                              |            |                                                                                     |          |          |          |          |          |          |       |           |             |
| CompareExchange_byte_nomatch | Job-AZJEDS | /artifacts_Atomic/tests/coreclr/linux.loongarch64.Release/Tests/Core_Root/corerun   | 22.82 ns | 0.002 ns | 0.001 ns | 22.82 ns | 22.82 ns | 22.82 ns |  1.00 |         - |          NA |
| CompareExchange_byte_nomatch | Job-NEIHQV | /artifacts_noAtomic/tests/coreclr/linux.loongarch64.Release/Tests/Core_Root/corerun | 22.83 ns | 0.014 ns | 0.011 ns | 22.82 ns | 22.82 ns | 22.86 ns |  1.00 |         - |          NA |
managed implementation machine instruction sequence
3ad900: addi.d  $sp, $sp, -16
3ad904: st.d    $fp, $sp, 0
3ad908: st.d    $ra, $sp, 8
3ad90c: ori     $fp, $sp, 0
3ad910: ori     $a3, $a0, 0
3ad914: andi    $a3, $a3, 3
3ad918: sub.d   $a0, $a0, $a3
3ad91c: slli.w  $a3, $a3, 3
3ad920: andi    $a3, $a3, 31 
3ad924: addi.w  $a4, $zero, 255
3ad928: sll.w   $a4, $a4, $a3
3ad92c: nor     $a4, $a4, $zero
3ad930: bstrpick.d $a1, $a1, 7, 0
3ad934: sll.w   $a1, $a1, $a3
3ad938: bstrpick.d $a2, $a2, 7, 0
3ad93c: sll.w   $a2, $a2, $a3
3ad940: ld.w    $a5, $a0, 0
3ad944: b       0x3ad94c
3ad948: slli.w  $a5, $a6, 0
3ad94c: and     $a6, $a5, $a4
3ad950: slli.w  $a6, $a6, 0
3ad954: or      $a7, $a6, $a1
3ad958: slli.w  $a7, $a7, 0
3ad95c: or      $a6, $a6, $a2
3ad960: slli.w  $a6, $a6, 0
3ad964: slli.w  $a6, $a6, 0
3ad968: dbar    0
3ad96c: ll.w    $t1, $a0, 0
3ad970: bne     $t1, $a6, 0x3ad980
3ad974: ori     $t0, $a7, 0
3ad978: sc.w    $t0, $a0, 0
3ad97c: beq     $t0, $zero, 0x3ad96c
3ad980: dbar    0
3ad984: slli.w  $a6, $t1, 0
3ad988: slli.w  $ra, $a5, 0
3ad98c: slli.w  $r21, $a6, 0
3ad990: bne     $r21, $ra, 0x3ad948
3ad994: srl.w   $a0, $t1, $a3
3ad998: bstrpick.d $a0, $a0, 7, 0
3ad99c: ld.d    $ra, $sp, 8
3ad9a0: ld.d    $fp, $sp, 0
3ad9a4: addi.d  $sp, $sp, 16
3ad9a8: ret

Comment thread src/coreclr/jit/codegenloongarch64.cpp Outdated
@dotnet-policy-service dotnet-policy-service Bot added the linkable-framework Issues associated with delivering a linker friendly framework label Jul 2, 2026
Comment thread src/coreclr/inc/clrconfigvalues.h Outdated
Comment on lines +720 to +721
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableLoongArch64LAM_BH, W("EnableLoongArch64LAM_BH"), 0, "Allows LoongArch64 LAM_BH+ hardware intrinsics to be disabled")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableLoongArch64LAM_CAS, W("EnableLoongArch64LAM_CAS"), 0, "Allows LoongArch64 LAM_BH+ hardware intrinsics to be disabled")

@am11 am11 Jul 2, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we aren't turning it on by default?

Suggested change
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableLoongArch64LAM_BH, W("EnableLoongArch64LAM_BH"), 0, "Allows LoongArch64 LAM_BH+ hardware intrinsics to be disabled")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableLoongArch64LAM_CAS, W("EnableLoongArch64LAM_CAS"), 0, "Allows LoongArch64 LAM_BH+ hardware intrinsics to be disabled")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableLoongArch64LAM_BH, W("EnableLoongArch64LAM_BH"), 1, "Allows LoongArch64 LAM_BH+ hardware intrinsics to be disabled")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableLoongArch64LAM_CAS, W("EnableLoongArch64LAM_CAS"), 1, "Allows LoongArch64 LAM_BH+ hardware intrinsics to be disabled")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we aren't turning it on by default?

There may be a situation on LA where the system CPU supports LAM, but the virtual machine simulated on that machine does not support LAM.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those situations more common, making the parity favor that direction? i.e., would you rather have the regular user with "desired system state" know about DOTNET_EnableLoongArch64LAM_CAS=1, or would you rather make it disablable so people running "unusual" configurations can tweak the environment (like turning off TieredCompilation to work around situations, which we sometimes do)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a situation on LA where the system CPU supports LAM, but the virtual machine simulated on that machine does not support LAM.

If the virtual machine does not support LAM, LAM should be reported as unsupported by minipal_getcpufeatures.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LAM should be reported as unsupported by minipal_getcpufeatures

The implementation of LA in the minipal_getcpufeatures is read through the cpucfg instruction, not a system function, so I am not sure if it can be correctly distinguished on the virtual machine because it has not been verified. To ensure correctness, the default switch is turned off.

Are those situations more common

This is not common, we have not currently used such an environment. But I can't be sure if there are other customers using it.

@jkotas jkotas Jul 3, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you cannot reliably detect whether CAS is supported by the machine you are running on, the opportunistic light-up for native aot does not make sense and it should be reverted. The opportunistic light-up for native aot makes sense only if this switch is on by default for regular CoreCLR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just asked my colleague about the virtual machine situation, which is also a hypothetical situation, so here we still default to turning on the switch.

/// <summary>Provides access to the LoongArch64 Atomic hardware instructions via intrinsics.</summary>
[Intrinsic]
[CLSCompliant(false)]
public abstract class LAM

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These helper types should be internal. I do not think we need to expose them as public APIs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have already made the modifications.

StoreL3NonTemporal = 13
};
}
namespace System.Runtime.Intrinsics.LoongArch

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in this file should be reverted.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant this PR should not be changing this file at all.


result = NI_IsSupported_Type;
}
#if defined(TARGET_LOONGARCH64)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be done by enabling FEATURE_HW_INTRINSICS for loongarch64 instead of this one-off check.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be done by enabling FEATURE_HW_INTRINSICS for loongarch64 instead of this one-off check.

I remember there is a strong coupling between FEATURE-HW-INTRINSICS and FEATURE_SIMD. Opening FEATURE-HW-INTRINSICS separately may introduce more problems. This is just a solution. After supporting SIMD on LA, we will modify it to directly use FEATURE-HW-INRINSICS.

Comment thread src/coreclr/inc/corinfoinstructionset.h Outdated
#ifdef TARGET_LOONGARCH64
InstructionSet_LAM_BH=1,
InstructionSet_LAM_CAS=2,
// TODO-LA-SIMD-other: add other ISA-features.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be exactly what the auto-generator produced, without any manual changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

/// <para>AMSWAP[_DB].H rd, rk, rj</para>
/// <para>This is the newly added atomic instruction on ISA1.1</para>
/// </summary>
internal static ushort Exchange(ref ushort location1, ushort value) => Exchange(ref location1, value);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does Exchange work for int and long that it is missing here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For int and long types Exchange supports both V1.0 and V1.1.

lawn123 added 9 commits July 3, 2026 16:28
Change-Id: I3b7757ecfeff9aba28407508b196d55b4e21b69d
Change-Id: Ib826fc3ffb9057835357b68a51e20da66f9cebd5
Change-Id: Id3be930d58440bbc39b307a211a9fcb444f1e3ca
…on and opportunistics light-up for NAOT

Change-Id: I419341c499d0c4e955a60e5f4ad6fdeef74f9277
…rmatting issues

Change-Id: I1b3d1010ac3642624ec1f66eb26dc11220460d7a
…efault and revert System.Runtime.Intrinsics.cs changes.

Change-Id: If9f10ec7908aaf706b1d2ad9f2e34db073543d36
Change-Id: I2c1e8d43106616da47e231670e557f508330e3d8
…n.sh

Change-Id: I3240c5d2413950981b4f1446d837d9c0de4e9a65
Change-Id: I51fe7faac05749bc38344e2068bb2efccc1ce99a
@lawn123 lawn123 reopened this Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arch-loongarch64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member linkable-framework Issues associated with delivering a linker friendly framework

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants