Cannot convert efficiently a System.Numerics.Vector4 to a Vector3

Hey JIT compiler friends, 😊 

Vector3 is quite convenient and popular to manipulate e.g position/speed, used for storage and calculations. The problem comes that loading/storing is requiring several instructions to store the elements separately (unlike for a Vector4).

One technique that is usually used is to load a Vector4 and downcasting it to a Vector3, this operation should effectively zero out the `.w` component of the SIMD register. Usually, the load is made on safe data boundaries were you know that loading w from memory is ok (even with garbage) and is safe (e.g no page fault access).

The problem is that while upcasting from a `Vector3` to a `Vector4` has a proper intrinsics, downcasting doesn't and generates always stack spilling. I haven't found a way to workaround this, so I usually have to replace all Vector3 to Vector4, which is not ideal.

For example the following code (on [sharplab.io](https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIAYACY8gOgDkBXfGKASzFwBuGvSasAShwB2GXtxYBJGXym5+QkY2YtJMuTBYBhCPgAOvADY8AyjwBu/GBtpaJ02fKUYeEU7agOYE7C1CIAzAzSuNgAZjBMpAyGDADeNAwZDKZ8dtjeYkhMKAwAKk4YAGowYBjQKCUQVTXQYXbkABTWpthSADxNtVAoAHwMuN1SaAwQHBgMAy3Ts6azAJSp6ZlbC1ARsLgcFnMAvPPVg2EsAFo+IVtbMdDtvDIMvAyndIJvDL1jEywADIwKQAcwwAAtvrwANQw9Zpaj3ZEMfaHOYw07jHoAbV4AF0WABBXA7cikAAc7VWxNJ5xa1LuKIAvjRNsiZhgVidUU50UyMqykZl2Vkcnl4sxCsRimVcJV6UMGjtWqROhN+oqRv8elNOWdmrslly1hthciVbyDkcPgaLtdbqKHk8XnN3p9ob8dVIgSDwVC3nCEU6UbkoAw7MUsRM8fiBSiMmGIxFTlIYAB3O0MyMsAAaUxzAE0Cyhrqt4wm0TbMcmK5khSGMpzubaqxh48ytqLRdleLl8lKiqVyirlVr2paXtypl0eprDdrsVJg+b7pa27aVQ6oBA64nsOG7Cm3lJuXuGI8oM9Xu6GF8fn8l76wZDoUGzQmtkvY7a05mdig7RHlMdAsHQMTlo2DBClsQpCkAA=))

```c#
    private static void TestVector4ToVector3v1(Span<Vector4> span, out Vector3 output) {
        Vector3 result = Vector3.Zero;
        for(int i = 0; i < span.Length; i++) {
            result += span[i].AsVector128().AsVector3();
        }

        output = result;
    }
```

will generate the following code:

```
C.TestVector4ToVector3v1(System.Span`1<System.Numerics.Vector4>, System.Numerics.Vector3 ByRef)
    L0000: sub rsp, 0x18
    L0004: vzeroupper
    L0007: mov rax, [rcx]
    L000a: mov ecx, [rcx+8]
    L000d: vxorps xmm0, xmm0, xmm0
    L0012: xor r8d, r8d
    L0015: test ecx, ecx
    L0017: jle short L004e
    L0019: nop [rax]
    L0020: mov r9d, r8d
    L0023: shl r9, 4
    L0027: vmovupd xmm1, [rax+r9]
    <<<<<<<<<<<<<< stack spilling and reload - begin
    L002d: vmovapd [rsp], xmm1
    L0032: vmovss xmm1, [rsp+8]
    L0038: vmovsd xmm2, [rsp]
    <<<<<<<<<<<<<< stack spilling and reload - end
    L003d: vshufps xmm2, xmm2, xmm1, 0x44
    L0042: vaddps xmm0, xmm0, xmm2
    L0046: inc r8d
    L0049: cmp r8d, ecx
    L004c: jl short L0020
    L004e: vmovsd [rdx], xmm0
    L0052: vpshufd xmm1, xmm0, 2
    L0057: vmovss [rdx+8], xmm1
    L005c: add rsp, 0x18
    L0060: ret
```

Trying to workaround it via the following doesn't work either which generates a code similar to the code above. See the sharplab link above. For which I was more surprised, as the `Vector3(float, float, float)` constructor is marked as an intrinsic...

```C#
    private static void TestVector4ToVector3v2(Span<Vector4> span, out Vector3 output) {
        Vector3 result = Vector3.Zero;
        for(int i = 0; i < span.Length; i++) {
            var v4 = span[i];
            var v3 = new Vector3(v4.X, v4.Y, v4.Z);
            result += v3;
        }

        output = result;
    }    
```

Instead, a downcast should be able to generate a similar code to what we could get with an upcast by setting directly to 0.0f the `.w` component like this:

```
    L0027: vxorps xmm0, xmm0, xmm0
    L002b: vinsertps xmm0, xmm1, xmm0, 0x30
```

Maybe I have missed something in the API that is providing such conversion but I failed to find it... 🤔 

Would it possible to optimize this conversion as proposed here?

Thanks!

(Edit: consequently, this applies to any downcast e.g so to Vector2 as well, or from Vector3 to Vector2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cannot convert efficiently a System.Numerics.Vector4 to a Vector3 #86220

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot convert efficiently a System.Numerics.Vector4 to a Vector3 #86220

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions