JIT doesn't properly optimize assignment to stackalloced span elements when used as 'out' parameters #38628
Open
Description
The JIT seems to include a bunch of unnecessary ceremony when using the pattern Span<T> = stackalloc T[const];
and assigning out parameters from functions to those elements. The codegen is especially noisy when using ValueTuple
as a return type and deconstructing the values directly into a span.
See the examples below, with some assembly-level annotations inline.
using System;
using System.Runtime.CompilerServices;
[module: System.Runtime.CompilerServices.SkipLocalsInit]
public class C {
// Deconstruct a ValueTuple's values into the span
public static void WithDecompose(uint i) {
Span<Char> chars = stackalloc Char[2];
(chars[0], chars[1]) = Foo(i);
}
// Assign 'out' values to the span
public static void WithOuts(uint i) {
Span<Char> chars = stackalloc Char[2];
Bar(i, out chars[0], out chars[1]);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static (char a, char b) Foo(uint i)
{
return ((char)i, (char)(i >> 16));
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static void Bar(uint i, out char a, out char b)
{
a = (char)i;
b = (char)(i >> 16);
}
}
; C.WithDecompose(UInt32)
L0000: push esi
L0001: sub esp, 0x10
L0004: mov dword ptr [esp+0xc], 0xa3ec260e
L000c: lea eax, [esp]
L000f: mov edx, 2
L0014: cmp edx, 0
L0017: jbe short L0067 ; this branch is guaranteed never to be taken
L0019: lea edx, [eax+2]
L001c: mov word ptr [esp+4], 0 ; looks like zero-initing, even with [SkipLocalsInit]?
L0023: mov word ptr [esp+6], 0
L002a: movzx esi, cx
L002d: mov [esp+4], si ; L002a can be deleted, this can be mov word ptr [esp + 4], cx
L0032: shr ecx, 0x10
L0035: movzx ecx, cx
L0038: mov [esp+6], cx ; L0035 can be deleted, this can be mov word ptr [esp + 6], cx
L003d: mov ecx, [esp+4] ; looks like some register shuffling going on?
L0041: mov [esp+8], ecx
L0045: mov ecx, [esp+8]
L0049: mov [eax], cx
L004c: mov eax, [esp+0xa] ; JIT generated an unaligned 32-bit read? (assuming esp is 4-byte aligned)
L0050: mov [edx], ax
L0053: cmp dword ptr [esp+0xc], 0xa3ec260e
L005b: je short L0062
L005d: call 0x61cb2530
L0062: add esp, 0x10
L0065: pop esi
L0066: ret
L0067: call 0x61cb16d0
L006c: int3
; C.WithOuts(UInt32)
L0000: sub esp, 8
L0003: mov dword ptr [esp+4], 0xa3ec260e
L000b: lea eax, [esp]
L000e: mov edx, 2
L0013: cmp edx, 0
L0016: jbe short L0037 ; this branch is guaranteed never to be taken
L0018: lea edx, [eax+2] ; nit: this could be folded into the mov instruction at L0021
L001b: mov [eax], cx
L001e: shr ecx, 0x10
L0021: mov [edx], cx
L0024: cmp dword ptr [esp+4], 0xa3ec260e
L002c: je short L0033
L002e: call 0x61cb2530
L0033: add esp, 8
L0036: ret
L0037: call 0x61cb16d0
L003c: int3
The unaligned read at L004c
is particularly suspicious, so I wonder if that's sharplab incorrectly reporting a movzx instruction as a mov instruction.
category:cq
theme:stack-allocation
skill-level:expert
cost:large
Activity