Skip to content

Suboptimal Codegen for Vector128.AsVector128(Vector2) #69298

Closed
@MineCake147E

Description

@MineCake147E

Description

When I was playing with Vector2 and Vector128, I felt that Vector128.AsVector128(Vector2) was not optimal.

C#

using System;
using System.Numerics;
using System.Runtime.Intrinsics;
public static class C
{
    public static Vector128<float> AsVector128(Vector2 value) => value.AsVector128();
}

Current Codegen by SharpLab

; Core CLR 6.0.322.12309 on amd64

C.AsVector128(System.Numerics.Vector2)
    L0000: vzeroupper
    L0003: vmovq xmm0, rdx
    L0008: vxorps xmm1, xmm1, xmm1
    L000c: vinsertps xmm0, xmm0, xmm1, 0x20
    L0012: vxorps xmm1, xmm1, xmm1
    L0016: vinsertps xmm0, xmm0, xmm1, 0x30
    L001c: vmovupd [rcx], xmm0
    L0020: mov rax, rcx
    L0023: ret

Expected Codegen

For separated methods that pass a value in rdx:

vzeroupper
vmovq xmm0, rdx  ;automatically clears all the upper bits in xmm0
vmovdqu [rcx], xmm0
mov rax, rcx
ret

For separated methods that pass a reference to a value as rdx:

vzeroupper
vmovsd xmm0, [rdx]  ;automatically clears all the upper bits in xmm0
vmovupd [rcx], xmm0
mov rax, rcx
ret

If it's inlined and rdx has the value:

vmovq xmm0, rdx  ;automatically clears all the upper bits in xmm0

If it's inlined and xmm1 has the value:

vmovddup xmm0, xmm1  ;if later calculation don't really care about upper 64 bits

or if it's necessary to clear upper 64 bits:

vxorps xmm0, xmm0, xmm0  ;clear xmm0 by hand
vmovsd xmm0, xmm0, xmm1 ;merge lower 64bits of xmm1

If it's inlined and rsi has the reference to the value:

vmovsd xmm0, [rsi]  ;automatically clears all the upper bits in xmm0

Configuration

SharpLab (2022/05/13)

Regression?

No

Data

Analysis

This code in Vector128.cs might be a problem.

        public static Vector128<float> AsVector128(this Vector2 value)
            => new Vector4(value, 0.0f, 0.0f).AsVector128();

This implementation would not be efficient because new Vector4(Vector2, float, float) would emit two vinsertps instructions that would be unnecessary when inserting zeros.
SharpLab

using System;
using System.Numerics;
using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics;
public static class C
{
    public static Vector4 Ctor(Vector2 value, float z, float w) => new(value, z, w);
}
C.Ctor(System.Numerics.Vector2, Single, Single)
    L0000: vzeroupper
    L0003: vmovq xmm0, rdx
    L0008: vinsertps xmm0, xmm0, xmm2, 0x20
    L000e: vinsertps xmm0, xmm0, xmm3, 0x30
    L0014: vmovupd [rcx], xmm0
    L0018: mov rax, rcx
    L001b: ret

The same is true for Vector128.AsVector128(Vector3).

category:cq
theme:vector-codegen
skill-level:intermediate
cost:small
impact:small

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIhelp wanted[up-for-grabs] Good issue for external contributorstenet-performancePerformance related issue

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions