Description
Description
When I was playing with Vector2 and Vector128, I felt that Vector128.AsVector128(Vector2)
was not optimal.
C#
using System;
using System.Numerics;
using System.Runtime.Intrinsics;
public static class C
{
public static Vector128<float> AsVector128(Vector2 value) => value.AsVector128();
}
Current Codegen by SharpLab
; Core CLR 6.0.322.12309 on amd64
C.AsVector128(System.Numerics.Vector2)
L0000: vzeroupper
L0003: vmovq xmm0, rdx
L0008: vxorps xmm1, xmm1, xmm1
L000c: vinsertps xmm0, xmm0, xmm1, 0x20
L0012: vxorps xmm1, xmm1, xmm1
L0016: vinsertps xmm0, xmm0, xmm1, 0x30
L001c: vmovupd [rcx], xmm0
L0020: mov rax, rcx
L0023: ret
Expected Codegen
For separated methods that pass a value in rdx
:
vzeroupper
vmovq xmm0, rdx ;automatically clears all the upper bits in xmm0
vmovdqu [rcx], xmm0
mov rax, rcx
ret
For separated methods that pass a reference to a value as rdx
:
vzeroupper
vmovsd xmm0, [rdx] ;automatically clears all the upper bits in xmm0
vmovupd [rcx], xmm0
mov rax, rcx
ret
If it's inlined and rdx
has the value:
vmovq xmm0, rdx ;automatically clears all the upper bits in xmm0
If it's inlined and xmm1
has the value:
vmovddup xmm0, xmm1 ;if later calculation don't really care about upper 64 bits
or if it's necessary to clear upper 64 bits:
vxorps xmm0, xmm0, xmm0 ;clear xmm0 by hand
vmovsd xmm0, xmm0, xmm1 ;merge lower 64bits of xmm1
If it's inlined and rsi
has the reference to the value:
vmovsd xmm0, [rsi] ;automatically clears all the upper bits in xmm0
Configuration
SharpLab (2022/05/13)
Regression?
No
Data
Analysis
This code in Vector128.cs might be a problem.
public static Vector128<float> AsVector128(this Vector2 value)
=> new Vector4(value, 0.0f, 0.0f).AsVector128();
This implementation would not be efficient because new Vector4(Vector2, float, float)
would emit two vinsertps
instructions that would be unnecessary when inserting zeros.
SharpLab
using System;
using System.Numerics;
using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics;
public static class C
{
public static Vector4 Ctor(Vector2 value, float z, float w) => new(value, z, w);
}
C.Ctor(System.Numerics.Vector2, Single, Single)
L0000: vzeroupper
L0003: vmovq xmm0, rdx
L0008: vinsertps xmm0, xmm0, xmm2, 0x20
L000e: vinsertps xmm0, xmm0, xmm3, 0x30
L0014: vmovupd [rcx], xmm0
L0018: mov rax, rcx
L001b: ret
The same is true for Vector128.AsVector128(Vector3)
.
category:cq
theme:vector-codegen
skill-level:intermediate
cost:small
impact:small