Skip to content

Vector<T> operations don't take advantage of memory operands #13798

Open
@GrabYourPitchforks

Description

@GrabYourPitchforks

With dotnet/coreclr#22944, the raw hardware intrinsics are able to take advantage of folding the memory load operation into the SIMD instruction itself.

However, this same optimization was not applied to Vector and Vector<T> more generally, even though they're using nearly identical codegen under the covers.

public static Vector<byte> M(Vector<byte> a, ref Vector<byte> b)
{
    return Vector.Equals(a, b);
}

public static Vector256<byte> N(Vector256<byte> a, ref Vector256<byte> b)
{
    return Avx2.CompareEqual(a, b);
}
; C.M(System.Numerics.Vector`1<Byte>, System.Numerics.Vector`1<Byte> ByRef)
    L0000: vzeroupper
    L0003: vmovupd ymm0, [rdx]
    L0007: vmovupd ymm1, [r8]   ; note the allocation of register ymm1
    L000c: vpcmpeqb xmm0, xmm0, xmm1
    L0010: vmovupd [rcx], ymm0
    L0014: mov rax, rcx
    L0017: vzeroupper
    L001a: ret

; C.N(System.Runtime.Intrinsics.Vector256`1<Byte>, System.Runtime.Intrinsics.Vector256`1<Byte> ByRef)
    L0000: vzeroupper
    L0003: vmovupd ymm0, [rdx]
    L0007: vpcmpeqb xmm0, xmm0, [r8]   ; operation doesn't touch register ymm1
    L000c: vmovupd [rcx], ymm0
    L0010: mov rax, rcx
    L0013: vzeroupper
    L0016: ret

category:cq
theme:vector-codegen
skill-level:intermediate
cost:medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIoptimization

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions