-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Have SIMD Load/Store use GT_IND and GT_ASG where possible #80411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… use gtNewSimdLoadNode
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsThis updates how In particular, The primary "negative" of this PR is that in disassembly. In particular, This same optimization will not be possible for more complex types of loads/stores, such as aligned, scalar, or non-temporal, without additional work to track the relevant information.
|
|
It's worth noting that @EgorBo and @SingleAccretion have both suggested this change previously (with Egor doing a minimal prototype as well). At the time, I was unsure if such a change would have been profitable given the planned work around AVX-512, VectorMask, and other scenarios where the |
|
CC. @dotnet/jit-contrib Results in code simplification, 50k/30k smaller codegen for minopts on Arm64/x64, 90k/3k smaller codegen for fullopts on Arm64/x64, and a |
40aee31 to
65575ca
Compare
|
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop |
|
Azure Pipelines successfully started running 3 pipeline(s). |
|
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop |
|
Azure Pipelines successfully started running 3 pipeline(s). |
|
No changes, just merged in main in a hopes that it resolves the unrelated Arm64 CI failures. All jitstress tests passed, so not going to rerun them. |
EgorBo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice clean up, looking forward to seeing perf impact
This updates how
Vector64/128/256.LoadandVector64/128/256.Storeare handled.In particular,
Loadis imported as aGT_INDandStoreis imported as aGT_ASG. This allows existing optimizations that are present for the more common IR to kick in.The primary "negative" of this PR is that in disassembly. In particular,
GT_HWINTRINSICtracks a base type where-asGT_INDandGT_ASGdo not. This means that where aLoad/StoreforVector128<int>might use amovdqu, the corresponding code forGT_IND/ASGalways usesmovups. This should have no impact since loads/stores are handled by their own pipeline that counts as neither "integer" nor "floating-point" based, so there is no additional latency for its subsequent use.This same optimization will not be possible for more complex types of loads/stores, such as aligned, scalar, or non-temporal, without additional work to track the relevant information.