Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Remove overhead from AsyncValueTaskMethodBuilder.Create #13390

Merged
merged 1 commit into from
Aug 16, 2017

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Aug 16, 2017

AsyncValueTaskMethodBuilder<TResult>.Create wraps an AsyncTaskMethodBuilder<TResult> and thus initializes its _methodBuilder field to AsyncTaskMethodBuilder<TResult>.Create(). But even though AsyncTaskMethodBuilder<TResult>.Create() just returns default(AsyncTaskMethodBuilder<TResult>), a non-trivial amount of code is being generated into AsyncValueTaskMethodBuilder<TResult>.Create, even though it's really just returning default(AsyncValueTaskMethodBuilder). So while the "right" thing to do is to call Create, this commit just skips it and adds a comment that if AsyncTaskMethodBuilder<TResult>.Create ever becomes more than a nop returning the default value, various callers would need to be updated as well (though I doubt we'd ever change it). This reduces the overhead of a nop ValueTask-returning async method by ~30%.

For reference, here's the JitDisasm output for AsyncValueTaskMethodBuilder<int>.Create:

; Assembly listing for method System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder`1[Int32][System.Int32]:Create():struct
; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 RetBuf       [V00,T00] (  4,  4   )   byref  ->  rbx
;  V01 loc0         [V01,T01] (  3,  3   )  struct (32) [rsp+0x00]   do-not-enreg[SFB] must-init ld-addr-op
;# V02 loc1         [V02    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]
;
; Lcl frame size = 32

G_M59927_IG01:
       57                   push     rdi
       56                   push     rsi
       53                   push     rbx
       4883EC20             sub      rsp, 32
       C5F877               vzeroupper
       488BF1               mov      rsi, rcx
       488D3C24             lea      rdi, [rsp]
       B908000000           mov      ecx, 8
       33C0                 xor      rax, rax
       F3AB                 rep stosd
       488BCE               mov      rcx, rsi
       488BD9               mov      rbx, rcx

G_M59927_IG02:
       488D0424             lea      rax, bword ptr [rsp]

G_M59927_IG03:
       C4E17957C0           vxorpd   xmm0, xmm0
       C4E17A7F00           vmovdqu  qword ptr [rax], xmm0
       C4E17A7F4010         vmovdqu  qword ptr [rax+16], xmm0

G_M59927_IG04:
       33C0                 xor      rax, rax
       488D542408           lea      rdx, bword ptr [rsp+08H]
       C4E17957C0           vxorpd   xmm0, xmm0
       C4E17A7F02           vmovdqu  qword ptr [rdx], xmm0
       48894210             mov      qword ptr [rdx+16], rax
       488BFB               mov      rdi, rbx
       488D3424             lea      rsi, bword ptr [rsp]
       48A5                 movsq
       E899D21A5F           call     CORINFO_HELP_ASSIGN_BYREF
       E894D21A5F           call     CORINFO_HELP_ASSIGN_BYREF
       E88FD21A5F           call     CORINFO_HELP_ASSIGN_BYREF
       488BC3               mov      rax, rbx

G_M59927_IG05:
       4883C420             add      rsp, 32
       5B                   pop      rbx
       5E                   pop      rsi
       5F                   pop      rdi
       C3                   ret

; Total bytes of code 108, prolog size 29 for method System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder`1[Int32][System.Int32]:Create():struct
; ============================================================

cc: @benaadams, @geoffkizer, @jkotas, @AndyAyersMS

And a little microbenchmark:

using System;
using System.Diagnostics;
using System.Threading.Tasks;

class Test
{
    static async Task Main()
    {
        var sw = new Stopwatch();
        while (true)
        {
            sw.Restart();
            for (int i = 0; i < 30000000; i++) await Nop();
            sw.Stop();
            Console.WriteLine(sw.Elapsed.TotalMilliseconds);
        }
    }

    static async ValueTask<int> Nop() => 42;
}

AsyncValueTaskMethodBuilder.Create wraps an AsyncTaskMethodBuilder and thus initializes its _methodBuilder field to AsyncTaskMethodBuilder.Create.  But even though AsyncTaskMethodBuilder.Create just returns default(AsyncTaskMethodBuilder), a non-trivial amount of code is being generated into AsyncValueTaskMethodBuilder.Create, even though it's really just returning default(AsyncValueTaskMethodBuilder).  So while the "right" thing to do is to call Create, this commit just skips it and adds a comment that if AsyncTaskMethodBuilder.Create ever becomes more than a nop, various callers would need to be updated as well (though I doubt we'd ever change it).

This reduces the overhead of a nop ValueTask-returning async method by ~30%.
@AndyAyersMS
Copy link
Member

cc @CarolEidt

There sure are lots of opportunities here (in the before version) for struct-based optimizations.

@benaadams
Copy link
Member

I assume the 3 CORINFO_HELP_ASSIGN_BYREFs from

default(AsyncValueTaskMethodBuilder)
       E899D21A5F           call     CORINFO_HELP_ASSIGN_BYREF
       E894D21A5F           call     CORINFO_HELP_ASSIGN_BYREF
       E88FD21A5F           call     CORINFO_HELP_ASSIGN_BYREF

Are setting the 3 nulls?

struct AsyncValueTaskMethodBuilder<int>
{
    struct AsyncTaskMethodBuilder<int> m_coreState
    {
        struct AsyncMethodBuilderCore
        {
            IAsyncStateMachine m_stateMachine = null;
            Action m_defaultContextAction = null;
        }
        Task<int> m_task = null;
    }
}

Do they need to be helpers for null initialization?

@jkotas
Copy link
Member

jkotas commented Aug 16, 2017

There are no helpers needed to set the field to null. It can be set directly. The problem is that the JIT is only optimizing small structs today: https://github.com/dotnet/coreclr/blob/master/src/jit/lclvars.cpp#L1797. AsyncValueTaskMethodBuilder is a big struct. It is treated as block of memory. The details about content of the block are not tracked.

@stephentoub
Copy link
Member Author

@dotnet-bot test Tizen armel Cross Debug Build please

@stephentoub stephentoub merged commit 0c96b7b into dotnet:master Aug 16, 2017
@stephentoub stephentoub deleted the avtmbcreate branch August 16, 2017 14:25
dotnet-bot pushed a commit to dotnet/corert that referenced this pull request Aug 16, 2017
Remove overhead from AsyncValueTaskMethodBuilder.Create

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
jkotas pushed a commit to dotnet/corert that referenced this pull request Aug 16, 2017
Remove overhead from AsyncValueTaskMethodBuilder.Create

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
@stephentoub
Copy link
Member Author

stephentoub commented Aug 18, 2017

AsyncValueTaskMethodBuilder is a big struct

Yeah, although with the changes in #13105, two of those three reference fields will go away, and I believe it should be left as a struct of one reference field, one TResult field, and two bool fields, with AsyncTaskMethodBuilder being a struct of a single reference field.

There sure are lots of opportunities here (in the before version) for struct-based optimizations.

Should I open an issue for anything, or any such opportunities are already tracked?

@CarolEidt
Copy link

Should I open an issue for anything, or any such opportunities are already tracked?

Please open an issue, and/or if you are motivated you could look at these known issues: #1133, #1161, #1636, #3144, #3539, #5556 and if it looks like any of those you could either just add a note (or a reference to the new issue).

I hope to have a look at this, but I'm not sure when I'll get to it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants