Skip to content

Convert STORE_BLK into STORIND for SIMD #116265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 5, 2025
Merged

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Jun 3, 2025

Closes #116258

       xor      eax, eax
       mov      dword ptr [rdi+0x08], eax
       vxorps   xmm0, xmm0, xmm0
-      vmovdqu  xmmword ptr [rdi+0x10], xmm0
-      vxorps   xmm0, xmm0, xmm0
+      vmovdqu  ymmword ptr [rdi+0x10], ymm0

STORE_IND<SIMD*> is more coalescing-friendly

Some minor diffs

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 3, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@EgorBo EgorBo marked this pull request as ready for review June 3, 2025 15:51
@Copilot Copilot AI review requested due to automatic review settings June 3, 2025 15:51
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a hardware‐intrinsics path to transform zero‐init and const‐init SIMD stores from STORE_BLOCK to STORE_IND, using a vector constant node for better coalescing.

  • Introduces FEATURE_HW_INTRINSICS guard with const‐value handling
  • Builds a GenTreeVecCon from a byte‐filled simd_t and replaces the original init node
  • Changes the block operation to GT_STOREIND and invokes common lowering

@EgorBo
Copy link
Member Author

EgorBo commented Jun 3, 2025

@jakobbotsch @dotnet/jit-contrib small change

if we have GT_STORE_BLK<TYP_STRUCT> that is initialized with CNS_INT, we can replace it with GT_STORIND<SIMD>. The latter is more coalescing friendly

@EgorBo EgorBo requested a review from jakobbotsch June 3, 2025 17:30
@EgorBo
Copy link
Member Author

EgorBo commented Jun 3, 2025

hm.. oops, some unexpected diffs on arm64, let me take a look

UPD: I've just disabled it on arm64, SIMD stores can't be merged on arm64 anyway + arm64 has zero-reg

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of nits, and the TODO-CQ in the function header can be removed now I think. Otherwise LGTM.

EgorBo and others added 2 commits June 4, 2025 17:05
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
@EgorBo EgorBo merged commit 05e5a44 into dotnet:main Jun 5, 2025
109 checks passed
@EgorBo EgorBo deleted the merge-struct-stores branch June 5, 2025 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT: suboptimal store coalescing on x64
2 participants