Have SIMD Load/Store use GT_IND and GT_ASG where possible #80411

tannergooding · 2023-01-10T02:05:29Z

This updates how Vector64/128/256.Load and Vector64/128/256.Store are handled.

In particular, Load is imported as a GT_IND and Store is imported as a GT_ASG. This allows existing optimizations that are present for the more common IR to kick in.

The primary "negative" of this PR is that in disassembly. In particular, GT_HWINTRINSIC tracks a base type where-as GT_IND and GT_ASG do not. This means that where a Load/Store for Vector128<int> might use a movdqu, the corresponding code for GT_IND/ASG always uses movups. This should have no impact since loads/stores are handled by their own pipeline that counts as neither "integer" nor "floating-point" based, so there is no additional latency for its subsequent use.

This same optimization will not be possible for more complex types of loads/stores, such as aligned, scalar, or non-temporal, without additional work to track the relevant information.

… use gtNewSimdLoadNode

… APIs

…toreNode

ghost · 2023-01-10T02:05:40Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

This updates how Vector64/128/256.Load and Vector64/128/256.Store are handled.

In particular, Load is imported as a GT_IND and Store is imported as a GT_ASG. This allows existing optimizations that are present for the more common IR to kick in.

The primary "negative" of this PR is that in disassembly. In particular, GT_HWINTRINSIC tracks a base type where-as GT_IND and GT_ASG do not. This means that where a Load/Store for Vector128<int> might use a movdqu, the corresponding code for GT_IND/ASG always uses movups. This should have no impact since loads/stores are handled by their own pipeline that counts as neither "integer" nor "floating-point" based, so there is no additional latency for its subsequent use.

This same optimization will not be possible for more complex types of loads/stores, such as aligned, scalar, or non-temporal, without additional work to track the relevant information.

Author:	tannergooding
Assignees:	tannergooding
Labels:	`area-CodeGen-coreclr`
Milestone:	-

tannergooding · 2023-01-10T02:07:33Z

It's worth noting that @EgorBo and @SingleAccretion have both suggested this change previously (with Egor doing a minimal prototype as well). At the time, I was unsure if such a change would have been profitable given the planned work around AVX-512, VectorMask, and other scenarios where the simdBaseType became relevant again. After further consideration and investigation, I believe that the general transition here is going to be profitable, even if we ultimately have some other cases that do not get to participate in all the same optimizations as IND/ASG presently have.

tannergooding · 2023-01-10T17:41:25Z

CC. @dotnet/jit-contrib

Results in code simplification, 50k/30k smaller codegen for minopts on Arm64/x64, 90k/3k smaller codegen for fullopts on Arm64/x64, and a -0.02% to -0.05% TP improvement

src/coreclr/jit/hwintrinsicarm64.cpp

src/coreclr/jit/gentree.cpp

This reverts commit 0c55782.

tannergooding · 2023-01-10T21:54:01Z

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

azure-pipelines · 2023-01-10T21:54:32Z

Azure Pipelines successfully started running 3 pipeline(s).

tannergooding · 2023-01-11T03:55:38Z

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

azure-pipelines · 2023-01-11T03:56:03Z

Azure Pipelines successfully started running 3 pipeline(s).

tannergooding · 2023-01-12T16:36:11Z

No changes, just merged in main in a hopes that it resolves the unrelated Arm64 CI failures.

All jitstress tests passed, so not going to rerun them.

EgorBo

Nice clean up, looking forward to seeing perf impact

tannergooding added 9 commits January 9, 2023 16:11

Moving the LoadVector intrinsics to use gtNewSimdLoad*Node helper APIs

10269bd

Switching SimdLoadNode to return GT_IND

2feba9e

Merge separate imp*Intrinsic paths on xarch into impSpecialIntrinsic

633fb70

Updating the LoadVector64/128/256 APIs of Sse/Sse2/Avx and AdvSimd to…

3312e66

… use gtNewSimdLoadNode

Moving the StoreVector intrinsics to use gtNewSimdStore*Node helper…

b596f43

… APIs

Switching SimdStoreNode to return GT_ASG

45a70e8

Updating the Store APIs of Sse/Sse2/Avx and AdvSimd to use gtNewSimdS…

be3835e

…toreNode

Make the SIMD load/store instruction consistent between VEX and non-VEX

6e3c0cd

Use GTF_REVERSE_OPS instead of impSpillSideEffect

0c55782

ghost assigned tannergooding Jan 10, 2023

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 10, 2023

Applying formatting patch

d1d971f

tannergooding marked this pull request as ready for review January 10, 2023 17:38

SingleAccretion reviewed Jan 10, 2023

View reviewed changes

src/coreclr/jit/hwintrinsicarm64.cpp Outdated Show resolved Hide resolved

src/coreclr/jit/gentree.cpp Show resolved Hide resolved

src/coreclr/jit/gentree.cpp Outdated Show resolved Hide resolved

tannergooding added 2 commits January 10, 2023 12:44

Responding to PR feedback

12a09b3

Revert "Use GTF_REVERSE_OPS instead of impSpillSideEffect"

65575ca

This reverts commit 0c55782.

tannergooding force-pushed the better-simdld/st branch from 40aee31 to 65575ca Compare January 10, 2023 20:45

Remove an unnecessary assert for gtNewSimdLoad/StoreNode

43c4ce0

build-analysis bot mentioned this pull request Jan 11, 2023

Tracking issue for CI build timeouts #76454

Closed

Merge remote-tracking branch 'dotnet/main' into better-simdld/st

57ba3f4

EgorBo approved these changes Jan 13, 2023

View reviewed changes

tannergooding merged commit a2029fe into dotnet:main Jan 13, 2023

tannergooding deleted the better-simdld/st branch January 13, 2023 15:15

ghost locked as resolved and limited conversation to collaborators Feb 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Have SIMD Load/Store use GT_IND and GT_ASG where possible #80411

Have SIMD Load/Store use GT_IND and GT_ASG where possible #80411

Uh oh!

tannergooding commented Jan 10, 2023

Uh oh!

ghost commented Jan 10, 2023

Uh oh!

tannergooding commented Jan 10, 2023

Uh oh!

tannergooding commented Jan 10, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tannergooding commented Jan 10, 2023

Uh oh!

azure-pipelines bot commented Jan 10, 2023

Uh oh!

tannergooding commented Jan 11, 2023

Uh oh!

azure-pipelines bot commented Jan 11, 2023

Uh oh!

tannergooding commented Jan 12, 2023 •

edited

Loading

Uh oh!

EgorBo left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Have SIMD Load/Store use GT_IND and GT_ASG where possible #80411

Have SIMD Load/Store use GT_IND and GT_ASG where possible #80411

Uh oh!

Conversation

tannergooding commented Jan 10, 2023

Uh oh!

ghost commented Jan 10, 2023

Uh oh!

tannergooding commented Jan 10, 2023

Uh oh!

tannergooding commented Jan 10, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tannergooding commented Jan 10, 2023

Uh oh!

azure-pipelines bot commented Jan 10, 2023

Uh oh!

tannergooding commented Jan 11, 2023

Uh oh!

azure-pipelines bot commented Jan 11, 2023

Uh oh!

tannergooding commented Jan 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EgorBo left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tannergooding commented Jan 12, 2023 •

edited

Loading