Skip to content

CoreCLR generates suboptimal codegen with structs passed via multiple registers #89374

Open
@MichalPetryka

Description

@MichalPetryka

Description

As noted in #55357 (comment), on systems using SysV ABI when structs are passed via multiple registers instead of using the stack, CoreCLR isn't fully able to reason about such usage and spills the struct to stack in some cases.

Code:

    public static UInt128 Shift(UInt128 i)
    {
        return Unsafe.BitCast<Vector128<byte>, UInt128>(Sse2.ShiftLeftLogical128BitLane(Unsafe.BitCast<UInt128, Vector128<byte>>(i), 1));
    }

Current codegen:

       sub      rsp, 40
       vzeroupper
       mov      qword ptr [rsp+18H], rdi
       mov      qword ptr [rsp+20H], rsi
       vpslldq  xmm0, xmmword ptr [rsp+18H], 1
       vmovaps  xmmword ptr [rsp], xmm0
       mov      rax, qword ptr [rsp]
       mov      rdx, qword ptr [rsp+08H]
       add      rsp, 40
       ret

Expected codegen:

        vmovq   xmm0, rdi
        vpinsrq xmm0, xmm0, rsi, 1
        vpslldq xmm0, xmm0, 1
        vmovq   rax, xmm0
        vpextrq rdx, xmm0, 1
        ret

Configuration

Any SysV ABI OS (Linux, MacOS)
Current main branch.

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issue

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions