Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving the SIMD codegen for SIMD12 load/store #80083

Merged
merged 9 commits into from
Jan 6, 2023

Conversation

tannergooding
Copy link
Member

This adds support for containment on TYP_SIMD12 loads/stores and improves the codegen to require less temporary registers and use better instructions when available.

Improved Load:

- lea      rax, bword ptr [rcx+30H]
- vmovss   xmm0, dword ptr [rax+08H]
- vmovsd   xmm1, qword ptr [rax]
- vshufps  xmm1, xmm0, 68
+ vmovsd   xmm0, qword ptr [rcx+30H]
+ vinsertps xmm0, dword ptr [rcx+38H], 2

Improved Store:

- vmovsd   qword ptr [rdx], xmm1
- vpshufd  xmm0, xmm1, 2
- vmovss   dword ptr [rdx+08H], xmm0
+ vmovsd   qword ptr [rdx], xmm0
+ vextractps dword ptr [rdx+08H], xmm0, 2

Combined this saves 9 bytes of codegen and improves the PerScore by 1.5

Total diffs are all relatively similar. Emitting vmovsd + vinsertps or vmovsd + vextractps and removing now unnecessary lea in favor of containing them.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 31, 2022
@ghost ghost assigned tannergooding Dec 31, 2022
@ghost
Copy link

ghost commented Dec 31, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

This adds support for containment on TYP_SIMD12 loads/stores and improves the codegen to require less temporary registers and use better instructions when available.

Improved Load:

- lea      rax, bword ptr [rcx+30H]
- vmovss   xmm0, dword ptr [rax+08H]
- vmovsd   xmm1, qword ptr [rax]
- vshufps  xmm1, xmm0, 68
+ vmovsd   xmm0, qword ptr [rcx+30H]
+ vinsertps xmm0, dword ptr [rcx+38H], 2

Improved Store:

- vmovsd   qword ptr [rdx], xmm1
- vpshufd  xmm0, xmm1, 2
- vmovss   dword ptr [rdx+08H], xmm0
+ vmovsd   qword ptr [rdx], xmm0
+ vextractps dword ptr [rdx+08H], xmm0, 2

Combined this saves 9 bytes of codegen and improves the PerScore by 1.5

Total diffs are all relatively similar. Emitting vmovsd + vinsertps or vmovsd + vextractps and removing now unnecessary lea in favor of containing them.

Author: tannergooding
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib, this should be ready for review.

Gives some small size savings for x64 (~2k bytes in fullopts and ~0.5k bytes in minopts) and a small TP win on x64

@tannergooding
Copy link
Member Author

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@tannergooding
Copy link
Member Author

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@tannergooding
Copy link
Member Author

Fixed the jitstress failure. Results in ~3.3k savings on x86/x64 and a -0.01% TP improvement

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. There seems to be a failure for ARM though, but it looks like just one test case at the moment.

@tannergooding
Copy link
Member Author

There seems to be a failure for ARM though, but it looks like just one test case at the moment.

It's an unrelated/existing GC timeout. I've retriggered it and it should pass on rerun.

@tannergooding
Copy link
Member Author

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@runfoapp runfoapp bot mentioned this pull request Jan 6, 2023
@tannergooding tannergooding merged commit 87312f7 into dotnet:main Jan 6, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Feb 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants