Skip to content

Vector{128|256}<T> == Vector{128|256}<T> could emit PXOR + PTEST combo on x86 #85638

@gfoidl

Description

@gfoidl
bool Foo(Vector128<byte> v1, Vector128<byte> v2) => v1 == v2;

emits on x86:

vzeroupper
vmovupd xmm0, [rdx]
vpcmpeqb xmm0, xmm0, [r8]
vpmovmskb eax, xmm0
cmp eax, 0xffff
sete al
movzx eax, al
ret

So a vpcmpeqb + vpmovmskb + cmp. For SSE4.1 -- where ptest is available -- this could emit instead:

vzeroupper
vmovupd xmm0, [rdx]
-vpcmpeqb xmm0, xmm0, [r8]
-vpmovmskb eax, xmm0
-cmp eax, 0xffff
+vpxor xmm0, xmm0, [r8]
+vptest xmm0, xmm0
sete al
movzx eax, al
ret

So a vpxor + vptest. This codegen can be achieved by writing the method as

bool Foo(Vector128<byte> v1, Vector128<byte> v2) => (v1 ^ v2) == Vector128<byte>.Zero;

but this is unhandy (especially when there's a nice operator for it).

According Testing equality between two __m128i variables (especially in the comments) it's hard to benchmark which version is faster, but the vpxor + vptest combo may be faster on cpus where vpxor can run on more ports.
Another point is that the code-size shrinks (a bit).

Is it worth to change the codegen here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions