JIT doesn't properly optimize assignment to stackalloced span elements when used as 'out' parameters

The JIT seems to include a bunch of unnecessary ceremony when using the pattern `Span<T> = stackalloc T[const];` and assigning _out_ parameters from functions to those elements. The codegen is especially noisy when using `ValueTuple` as a return type and deconstructing the values directly into a span.

See the examples below, with some assembly-level annotations inline.

```cs
using System;
using System.Runtime.CompilerServices;

[module: System.Runtime.CompilerServices.SkipLocalsInit]

public class C {
    // Deconstruct a ValueTuple's values into the span
    public static void WithDecompose(uint i) {
        Span<Char> chars = stackalloc Char[2];
        (chars[0], chars[1]) = Foo(i);
    }
    
    // Assign 'out' values to the span
    public static void WithOuts(uint i) {
        Span<Char> chars = stackalloc Char[2];
        Bar(i, out chars[0], out chars[1]);
    }
    
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    static (char a, char b) Foo(uint i)
    {
        return ((char)i, (char)(i >> 16));
    }
    
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    static void Bar(uint i, out char a, out char b)
    {
        a = (char)i;
        b = (char)(i >> 16);
        
    }
}
```

```asm
; C.WithDecompose(UInt32)
    L0000: push esi
    L0001: sub esp, 0x10
    L0004: mov dword ptr [esp+0xc], 0xa3ec260e
    L000c: lea eax, [esp]
    L000f: mov edx, 2
    L0014: cmp edx, 0
    L0017: jbe short L0067  ; this branch is guaranteed never to be taken
    L0019: lea edx, [eax+2]
    L001c: mov word ptr [esp+4], 0  ; looks like zero-initing, even with [SkipLocalsInit]?
    L0023: mov word ptr [esp+6], 0
    L002a: movzx esi, cx
    L002d: mov [esp+4], si  ; L002a can be deleted, this can be mov word ptr [esp + 4], cx
    L0032: shr ecx, 0x10
    L0035: movzx ecx, cx
    L0038: mov [esp+6], cx  ; L0035 can be deleted, this can be mov word ptr [esp + 6], cx
    L003d: mov ecx, [esp+4]  ; looks like some register shuffling going on?
    L0041: mov [esp+8], ecx
    L0045: mov ecx, [esp+8]
    L0049: mov [eax], cx
    L004c: mov eax, [esp+0xa]  ; JIT generated an unaligned 32-bit read? (assuming esp is 4-byte aligned)
    L0050: mov [edx], ax
    L0053: cmp dword ptr [esp+0xc], 0xa3ec260e
    L005b: je short L0062
    L005d: call 0x61cb2530
    L0062: add esp, 0x10
    L0065: pop esi
    L0066: ret
    L0067: call 0x61cb16d0
    L006c: int3

; C.WithOuts(UInt32)
    L0000: sub esp, 8
    L0003: mov dword ptr [esp+4], 0xa3ec260e
    L000b: lea eax, [esp]
    L000e: mov edx, 2
    L0013: cmp edx, 0
    L0016: jbe short L0037  ; this branch is guaranteed never to be taken
    L0018: lea edx, [eax+2]  ; nit: this could be folded into the mov instruction at L0021
    L001b: mov [eax], cx
    L001e: shr ecx, 0x10
    L0021: mov [edx], cx
    L0024: cmp dword ptr [esp+4], 0xa3ec260e
    L002c: je short L0033
    L002e: call 0x61cb2530
    L0033: add esp, 8
    L0036: ret
    L0037: call 0x61cb16d0
    L003c: int3
```

The unaligned read at `L004c` is particularly suspicious, so I wonder if that's sharplab incorrectly reporting a _movzx_ instruction as a _mov_ instruction.

category:cq
theme:stack-allocation
skill-level:expert
cost:large

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT doesn't properly optimize assignment to stackalloced span elements when used as 'out' parameters #38628

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development