Skip to content

Optimize VectorX<T>.ConditionalSelect for constant masks #104092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 4, 2024

Conversation

ezhevita
Copy link
Contributor

@ezhevita ezhevita commented Jun 27, 2024

Resolves #104001.
I’ve also realized that it is not possible to implement this for arrays as I’ve originally intended since we have no immutability guarantees, so instead I’ve made sure it gets optimized with ReadOnlySpan<T> property and static VectorX<T> field.

Example:

[DisassemblyDiagnoser]
public class TestClass
{
    private ReadOnlySpan<byte> Mask => [255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0];

    [Benchmark]
    public byte Span()
    {
        return Vector128.ConditionalSelect(Vector128.Create(Mask), Vector128<byte>.Zero, Vector128<byte>.One)[0];
    }
}

Old codegen:

; Method TestNamespace.TestClass:Span():ubyte:this (FullOpts)
G_M000_IG01:

G_M000_IG02:
       vmovups  xmm0, xmmword ptr [reloc @RWD00]
       vxorps   xmm1, xmm1, xmm1
       vpand    xmm1, xmm1, xmm0
       vpandn   xmm0, xmm0, xmmword ptr [reloc @RWD16]
       vpor     xmm0, xmm0, xmm1
       vmovd    eax, xmm0
       movzx    rax, al

G_M000_IG03:
       ret      
RWD00  	dq	00FF00FF00FF00FFh, 00FF00FF00FF00FFh
RWD16  	dq	0101010101010101h, 0101010101010101h
; Total bytes of code: 36

New codegen:

; Method TestNamespace.TestClass:Span():ubyte:this (FullOpts)
G_M000_IG01:

G_M000_IG02:
       vmovups  xmm0, xmmword ptr [reloc @RWD00]
       vxorps   xmm1, xmm1, xmm1
       vmovups  xmm2, xmmword ptr [reloc @RWD16]
       vpblendvb xmm0, xmm2, xmm1, xmm0
       vmovd    eax, xmm0
       movzx    rax, al

G_M000_IG03:
       ret      
RWD00  	dq	00FF00FF00FF00FFh, 00FF00FF00FF00FFh
RWD16  	dq	0101010101010101h, 0101010101010101h
; Total bytes of code: 34

ezhevita added 6 commits June 26, 2024 03:05
This adds a check in the JIT for constant masks (`GT_CNS_VEC`, everything else gets lowered to it) and enables optimization to `BlendVariable` (`(v)pblendvb` instruction).
This currently does not work for masks loaded from an array in a field/variable.
Also this optimization is not triggered for platforms supporting AVX512F(/VL?) since it gets optimized earlier to `vpternlogd` instruction.
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 27, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jun 27, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@ezhevita
Copy link
Contributor Author

@dotnet-policy-service agree

Comment on lines 29487 to 29488
// TODO-XARCH-AVX512 Use VPBLENDM* and take input directly from K registers if cond is from
// MoveMaskToVectorSpecial.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment isn't applicable to the general query, it was specific to the CndSel lowering logic

@ezhevita ezhevita changed the title Optimize VectorT.ConditionalSelect for constant masks Optimize VectorX<T>.ConditionalSelect for constant masks Jul 1, 2024
Co-authored-by: Tanner Gooding <tagoo@outlook.com>
@tannergooding
Copy link
Member

@EgorBo, this should be good for secondary sign-off and merging now

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@tannergooding tannergooding merged commit 284aeaf into dotnet:main Jul 4, 2024
106 of 107 checks passed
@ezhevita ezhevita deleted the optimize-cndsel branch July 7, 2024 16:41
@github-actions github-actions bot locked and limited conversation to collaborators Aug 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VectorX<T>.ConditionalSelect doesn’t get optimized for const masks on non-AVX512 platforms
3 participants