-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Closed
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Description
bool Foo(Vector128<byte> v1, Vector128<byte> v2) => v1 == v2;emits on x86:
vzeroupper
vmovupd xmm0, [rdx]
vpcmpeqb xmm0, xmm0, [r8]
vpmovmskb eax, xmm0
cmp eax, 0xffff
sete al
movzx eax, al
retSo a vpcmpeqb + vpmovmskb + cmp. For SSE4.1 -- where ptest is available -- this could emit instead:
vzeroupper
vmovupd xmm0, [rdx]
-vpcmpeqb xmm0, xmm0, [r8]
-vpmovmskb eax, xmm0
-cmp eax, 0xffff
+vpxor xmm0, xmm0, [r8]
+vptest xmm0, xmm0
sete al
movzx eax, al
retSo a vpxor + vptest. This codegen can be achieved by writing the method as
bool Foo(Vector128<byte> v1, Vector128<byte> v2) => (v1 ^ v2) == Vector128<byte>.Zero;but this is unhandy (especially when there's a nice operator for it).
According Testing equality between two __m128i variables (especially in the comments) it's hard to benchmark which version is faster, but the vpxor + vptest combo may be faster on cpus where vpxor can run on more ports.
Another point is that the code-size shrinks (a bit).
Is it worth to change the codegen here?
Metadata
Metadata
Assignees
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI