-
Notifications
You must be signed in to change notification settings - Fork 5k
Remove HelperMethodFrame
s from Object
methods
#106497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove HelperMethodFrame
s from Object
methods
#106497
Conversation
Split GetHashCode into fast/slow managed functions. Split GetType into fast/slow managed functions.
Tagging subscribers to this area: @mangod9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
src/coreclr/System.Private.CoreLib/src/System/Object.CoreCLR.cs
Outdated
Show resolved
Hide resolved
...coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/RuntimeHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
...coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/RuntimeHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
...coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/RuntimeHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
...coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/RuntimeHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
@EgorBot -arm64 -x64 -perf using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
public class Bench
{
[Benchmark]
public void WB()
{
Foo foo = new Foo();
for (long i = 0; i < 200000000; i++)
foo.GetHashCode();
}
}
internal class Foo
{
} |
Benchmark results on Arm64 |
@EgorBot -arm64 -perf using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
public class Bench
{
[Benchmark]
public void WB()
{
Foo foo = new Foo();
for (long i = 0; i < 200000000; i++)
foo.GetHashCode();
}
}
internal class Foo
{
} |
Benchmark results on Arm64
Flame graphs: Main vs PR 🔥 For clean |
@EgorBot -intel -arm64 --disasm using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkRunner.Run<Bench>(args: args);
public class Bench
{
object o = new object();
[Benchmark]
public void ObjGetHashCode() => o.GetHashCode();
[Benchmark]
public Type ObjGetType() => o.GetType();
[Benchmark]
public void HoistGetType()
{
object _o = o;
for (long i = 0; i < 100000; i++)
_ = _o.GetType(); // should be hoisted
}
static Bench B {get;} = new Bench();
[Benchmark]
public Type ConstantFoldGetType() => B.GetType(); // should be optimized to typeof(B)
} |
Benchmark results on Intel
|
Benchmark results on Arm64
|
// Returns a Type object which represent this object instance. | ||
[Intrinsic] | ||
[MethodImpl(MethodImplOptions.InternalCall)] | ||
public extern Type GetType(); | ||
public unsafe Type GetType() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not inlined, then it becomes an indirect call just like any other managed call (while previously it used to be a direct FCall)
if it's inlined (shouldn't happen today I guess due to small callsite size vs a lot of unknown call inside of - it will scare the inliner) - it might ruin some jit opts where jit treats GetType as a
special intrinsic and relies on it even in late phases.
We can optimize 1) if it becomes an issue on VM side (tell jit that certain managed methods can be direct call)
@EgorBot -intel -arm64 -perf --disasm using System;
using BenchmarkDotNet.Attributes;
public class Bench
{
object o = new object();
[Benchmark]
public Type ObjGetType() => o.GetType();
} |
Benchmark results on Intel
Flame graphs: Main vs PR 🔥 For clean |
Benchmark results on Arm64
Flame graphs: Main vs PR 🔥 For clean |
I see stable results on EgorBot for GetType x64Regressed due to indirect vs direct call (+ a small CQ issue in the managed impl): ##### Main
; Bench.ObjGetType()
push rax
mov rdi,[rdi+8]
call System.Object.GetType() ;;;; direct call!
nop
add rsp,8
ret
; Total bytes of code 16
##### PR
; Bench.ObjGetType()
push rax
mov rdi,[rdi+8]
call qword ptr [708A18A0D7B8]; System.Object.GetType() ;;;; indirect
nop
add rsp,8
ret
; Total bytes of code 17
; System.Object.GetType()
push rbx
mov rbx,rdi
mov rdi,[rbx]
mov rax,[rdi+20]
add rax,10 ;;; should be part of the addressing mode?
mov rax,[rax]
test rax,rax
jne short M01_L00
call qword ptr [708A18A0D7D0]; System.Object.<GetType>g__GetTypeWorker|1_0(System.Runtime.CompilerServices.MethodTable*)
M01_L00:
nop
pop rbx
ret
; Total bytes of code 32 Arm64Actually improved! (reproducable) ##### Main
; Bench.ObjGetType()
stp x29, x30, [sp, #-0x10]!
mov x29, sp
ldr x0, [x0, #8]
bl #0xf64041142658
ldp x29, x30, [sp], #0x10
ret
; Total bytes of code 24
##### PR
; Bench.ObjGetType()
stp x29, x30, [sp, #-0x10]!
mov x29, sp
ldr x0, [x0, #8]
movz x1, #0xcdc8
movk x1, #0xdd36, lsl #16
movk x1, #0xf947, lsl #32
ldr x1, [x1]
blr x1; System.Object.GetType()
ldp x29, x30, [sp], #0x10
ret
; Total bytes of code 40
; System.Object.GetType()
stp x29, x30, [sp, #-0x20]!
str x19, [sp, #0x18]
mov x29, sp
mov x19, x0
ldr x0, [x19]
ldr x1, [x0, #0x20]
add x1, x1, #0x10
ldr x1, [x1]
cbnz x1, M01_L00
movz x1, #0xcde0
movk x1, #0xdd36, lsl #16
movk x1, #0xf947, lsl #32
ldr x1, [x1]
blr x1; System.Object.<GetType>g__GetTypeWorker|1_0(System.Runtime.CompilerServices.MethodTable*)
mov x1, x0
M01_L00:
mov x0, x1
ldr x19, [sp, #0x18]
ldp x29, x30, [sp], #0x20
ret
; Total bytes of code 76 I don't have a good explanation why |
Split
GetHashCode
into fast/slow managed functions.Split
GetType
into fast/slow managed functions.