Closed
Description
Let's look at the following code (based on this StackOverflow question):
public struct Cards3
{
public byte C0, C1, C2;
}
public struct Cards8
{
public byte C0, C1, C2, C3, C4, C5, C6, C7;
}
class Program
{
static void Main()
{
Run3();
Run8();
}
private static Cards3[] cards3 = new Cards3[1];
private static Cards8[] cards8 = new Cards8[1];
[MethodImpl(MethodImplOptions.NoInlining)]
public static int Run3()
{
var c = cards3[0];
return c.C0 - c.C1;
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static int Run8()
{
var c = cards8[0];
return c.C0 - c.C1;
}
}
Now let's look at the asm code (Windows 10, .NET Framework 4.6.1 (4.0.30319.42000), clrjit-v4.6.1080.0):
; Run3
var c = cards3[0];
00007FFEDF0A4752 in al,dx
00007FFEDF0A4753 sub byte ptr [rax-48h],cl
00007FFEDF0A4756 add byte ptr [rax],0B0h
00007FFEDF0A4759 out 72h,al
00007FFEDF0A475B add al,byte ptr [rax]
00007FFEDF0A475D add byte ptr [rax-75h],cl
00007FFEDF0A4760 add byte ptr [rbx+76000878h],al
00007FFEDF0A4766 adc al,48h
00007FFEDF0A4768 add eax,10h
00007FFEDF0A476B movzx edx,byte ptr [rax] ; !!!
00007FFEDF0A476E movzx eax,byte ptr [rax+1] ; !!!
return c.C0 - c.C1;
00007FFEDF0A4772 sub edx,eax ; !!!
00007FFEDF0A4774 mov eax,edx ; !!!
00007FFEDF0A4776 add rsp,28h
00007FFEDF0A477A ret
00007FFEDF0A477B call 00007FFF3EB57BE0
00007FFEDF0A4780 int 3
; Run8
var c = cards8[0];
00007FFEDF0B49A2 in al,dx
00007FFEDF0B49A3 sub byte ptr [rbx],dh
00007FFEDF0B49A5 ror byte ptr [rax-77h],44h
00007FFEDF0B49A9 and al,20h
00007FFEDF0B49AB mov rax,202902D0088h
00007FFEDF0B49B5 mov rax,qword ptr [rax]
00007FFEDF0B49B8 cmp dword ptr [rax+8],0
00007FFEDF0B49BC jbe 00007FFEDF0B49D8
00007FFEDF0B49BE mov rax,qword ptr [rax+10h] ; !!!
00007FFEDF0B49C2 mov qword ptr [rsp+20h],rax ; !!!
return c.C0 - c.C1;
00007FFEDF0B49C7 movzx eax,byte ptr [rsp+20h] ; !!!
00007FFEDF0B49CC movzx edx,byte ptr [rsp+21h] ; !!!
00007FFEDF0B49D1 sub eax,edx
00007FFEDF0B49D3 add rsp,28h
00007FFEDF0B49D7 ret
00007FFEDF0B49D8 call 00007FFF3EB57BE0
00007FFEDF0B49DD int 3
As you can see, in the Run3
case, RyuJIT keeps the target bytes (C0
, C1
) in the edx
, eax
registers; in the Run8
case, RyuJIT keeps them on stack (qword ptr [rsp+20h]
). Why? This may slightly degrade the performance of an application (see these benchmarks).
category:cq
theme:structs
skill-level:expert
cost:large
impact:large