Closed
Description
From #93214 (comment)
Currently, we unroll GT_BLK (for copy or init) via a single SIMD reg even for large blocks, e.g.:
[StructLayout(LayoutKind.Sequential, Size = 64)]
private struct Block64 { }
static void CopyBlock64(ref byte dest, ref byte src)
{
Unsafe.As<byte, Block64>(ref dest) = Unsafe.As<byte, Block64>(ref src);
}
vmovdqu ymm0, ymmword ptr [rdx]
vmovdqu ymmword ptr [rcx], ymm0
vmovdqu ymm0, ymmword ptr [rdx+0x20]
vmovdqu ymmword ptr [rcx+0x20], ymm0
It'd be nice if GT_BLK unrolling could rent 2 regs for unrolling instead of 1, SIMD regs usually don't have big RA pressure anyway. so it could be then:
vmovdqu ymm0, ymmword ptr [rdx]
vmovdqu ymm1, ymmword ptr [rdx+0x20]
vmovdqu ymmword ptr [rcx], ymm0
vmovdqu ymmword ptr [rcx+0x20], ymm1