Skip to content

Tweak async code gen for reduced exceptional stack consumption #26567

@stephentoub

Description

@stephentoub

Version Used: Microsoft (R) Visual C# Compiler version 2.8.0.62827 (362ec0e)

Consider the following code:

using System;
using System.Threading.Tasks;

class Program
{
    static async Task Main()
    {
        try { await Recur(40); } catch { }
    }

    [ThreadStatic]
    private static unsafe byte* t_last;

    static async Task Recur(int level)
    {
        if (level > 0)
        {
            try
            {
                await Recur(level - 1);
            }
            finally
            {
                byte loc = 42;
                unsafe
                {
                    byte* cur = &loc;
                    Console.WriteLine($"Frame size: {t_last - cur}");
                    t_last = cur;
                }
            }
        }
        else if (level == 0)
        {
            await Task.Yield();
            throw new Exception();
        }
    }
} 

This recursive async method outputs an approximation for its stack frame size, or more specifically how much stack space is taken up between the recursive invocations (which will include helper frames). When the throw new Exception(); is commented out, on my machine in a release x64 build I get values like:

Frame size: 880

When I then uncomment the throw new Exception();, I instead get values like:

Frame size: 10016

That’s an 11x increase in frame size when exceptions are thrown. And while generally exception performance isn’t considered a primary goal, in this case it can be problematic in that it can quickly consume all available stack space and result in stack overflows. Consider on Windows servers where the default stack size may be 512K. If you have a chain of async methods, where the leaf throws an exception, by default you’ll end up with such frame sizes for each of the continuations, so if you had 50 methods in a chain, you’d blow the stack.

The extra frame size is primarily due to SEH interop on Windows. By attaching WinDBG and using kf, we can see where all of the stack frame size is taken up (partial stack frames shown here):

…
92        a0 00000085`dc5f7ad0 00007ffd`d6c63064 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)$##6003AC2+0x163
93        d0 00000085`dc5f7ba0 00007ffd`d6cae7dc mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)$##6003AC1+0x14
94        30 00000085`dc5f7bd0 00007ffd`d6cb55d2 mscorlib_ni!System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()$##6006F5B+0x6c
95        50 00000085`dc5f7c20 00007ffd`d6c31637 mscorlib_ni!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)$##6004032+0x62
96        50 00000085`dc5f7c70 00007ffd`d6cabdb0 mscorlib_ni!System.Threading.Tasks.Task.FinishContinuations()$##6003F9D+0x3d7
97        90 00000085`dc5f7d00 00007ffd`d6cb8391 mscorlib_ni!System.Threading.Tasks.Task.Finish(Boolean)$##6003F72+0x50
98        60 00000085`dc5f7d60 00007ffd`d6cad413 mscorlib_ni!System.Threading.Tasks.Task`1[System.Threading.Tasks.VoidTaskResult].TrySetException(System.Object)$##6003E54+0x71
99        50 00000085`dc5f7db0 00007ffd`7a950fef mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[System.Threading.Tasks.VoidTaskResult].SetException(System.Exception)$##6005CBA+0x43
9a        40 00000085`dc5f7df0 00007ffd`da074651 0x00007ffd`7a950fef
9b        60 00000085`dc5f7e50 00007ffd`da074518 clr!ExceptionTracker::CallHandler+0xfd
9c        f0 00000085`dc5f7f40 00007ffd`da07442c clr!ExceptionTracker::CallCatchHandler+0x90
9d        a0 00000085`dc5f7fe0 00007ffd`e790ee4d clr!ProcessCLRException+0x31c
9e        e0 00000085`dc5f80c0 00007ffd`e7877670 ntdll!RtlpExecuteHandlerForUnwind+0xd
9f        30 00000085`dc5f80f0 00007ffd`da0751c0 ntdll!RtlUnwindEx+0x3a0
a0       6e0 00000085`dc5f87d0 00007ffd`da075173 clr!ClrUnwindEx+0x40
a1       520 00000085`dc5f8cf0 00007ffd`e790edcd clr!ProcessCLRException+0x2e9
a2        e0 00000085`dc5f8dd0 00007ffd`e7876c86 ntdll!RtlpExecuteHandlerForException+0xd
a3        30 00000085`dc5f8e00 00007ffd`e790dcfe ntdll!RtlDispatchException+0x3c6
a4       700 00000085`dc5f9500 00007ffd`e3ecf218 ntdll!KiUserExceptionDispatch+0x2e
a5       780 00000085`dc5f9c80 00007ffd`da07562a KERNELBASE!RaiseException+0x68
a6        e0 00000085`dc5f9d60 00007ffd`da07588a clr!RaiseTheExceptionInternalOnly+0x2aa
a7       100 00000085`dc5f9e60 00007ffd`d7634bcf clr!IL_Throw+0x10b
a8       1b0 00000085`dc5fa010 00007ffd`d6cab44c mscorlib_ni!System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()$##600539B+0x1f
a9        30 00000085`dc5fa040 00007ffd`7a950e4c mscorlib_ni!System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)$##6005CD4+0x3c
aa        30 00000085`dc5fa070 00007ffd`d6c631d3 0x00007ffd`7a950e4c
ab        a0 00000085`dc5fa110 00007ffd`d6c63064 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)$##6003AC2+0x163
ac        d0 00000085`dc5fa1e0 00007ffd`d6cae7dc mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)$##6003AC1+0x14
ad        30 00000085`dc5fa210 00007ffd`d6cb55d2 mscorlib_ni!System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()$##6006F5B+0x6c
ae        50 00000085`dc5fa260 00007ffd`d6c31637 mscorlib_ni!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)$##6004032+0x62
af        50 00000085`dc5fa2b0 00007ffd`d6cabdb0 mscorlib_ni!System.Threading.Tasks.Task.FinishContinuations()$##6003F9D+0x3d7
b0        90 00000085`dc5fa340 00007ffd`d6cb8391 mscorlib_ni!System.Threading.Tasks.Task.Finish(Boolean)$##6003F72+0x50
b1        60 00000085`dc5fa3a0 00007ffd`d6cad413 mscorlib_ni!System.Threading.Tasks.Task`1[System.Threading.Tasks.VoidTaskResult].TrySetException(System.Object)$##6003E54+0x71
b2        50 00000085`dc5fa3f0 00007ffd`7a950fef mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[System.Threading.Tasks.VoidTaskResult].SetException(System.Exception)$##6005CBA+0x43
…

All of those exception handling-related frames are consuming ~90% of the available stack space. Why are these on the stack at all as part of the continuation chain? If we look at the decompiled code generated for the async method, we see this:

private unsafe void MoveNext()
{
    int num = this.<>1__state;
    try
    {
        if (num != 0)
        {
            if (num == 1)
            {
                goto Label_00FA;
            }
            if (this.level <= 0)
            {
                goto Label_00B8;
            }
        }
        try
        {
            TaskAwaiter awaiter;
            if (num != 0)
            {
                awaiter = Program.Recur(this.level - 1).GetAwaiter();
                if (!awaiter.IsCompleted)
                {
                    this.<>1__state = num = 0;
                    this.<>u__1 = awaiter;
                    this.<>t__builder.AwaitUnsafeOnCompleted<TaskAwaiter, Program.<Recur>d__2>(ref awaiter, ref this);
                    return;
                }
            }
            else
            {
                awaiter = this.<>u__1;
                this.<>u__1 = new TaskAwaiter();
                this.<>1__state = num = -1;
            }
            awaiter.GetResult();
            goto Label_013F;
        }
        finally
        {
            if (num < 0)
            {
                byte loc = 0x2a;
                byte* cur = &loc;
                Console.WriteLine($"Frame size: {(long) ((Program.t_last - cur) / 1)}");
                Program.t_last = cur;
            }
        }
    Label_00B8:
        if (this.level != 0)
        {
            goto Label_013F;
        }
        YieldAwaitable.YieldAwaiter awaiter2 = Task.Yield().GetAwaiter();
        if (awaiter2.IsCompleted)
        {
            goto Label_0117;
        }
        this.<>1__state = num = 1;
        this.<>u__2 = awaiter2;
        this.<>t__builder.AwaitUnsafeOnCompleted<YieldAwaitable.YieldAwaiter, Program.<Recur>d__2>(ref awaiter2, ref this);
        return;
    Label_00FA:
        awaiter2 = this.<>u__2;
        this.<>u__2 = new YieldAwaitable.YieldAwaiter();
        this.<>1__state = num = -1;
    Label_0117:
        awaiter2.GetResult();
        throw new Exception();
    }
    catch (Exception exception)
    {
        this.<>1__state = -2;
        this.<>t__builder.SetException(exception);
        return;
    }
Label_013F:
    this.<>1__state = -2;
    this.<>t__builder.SetResult();
}

The important thing to note is that the call to SetException that completes the builder and thus the task and that thus executes any synchronous continuations is inside of the catch block. It’s thus being invoked as part of the unwind, such that all of those exception handling related frames are still on the stack.

But we could tweak the code gen to avoid this. The relevant portion of the IL for the above looks like this:

    }  // end .try
    catch [mscorlib]System.Exception 
    {
    IL_0126:  stloc.s    V_6
    IL_0128:  ldarg.0
    IL_0129:  ldc.i4.s   -2
    IL_012b:  stfld      int32 Program/'<Recur>d__2'::'<>1__state'
    IL_0130:  ldarg.0
    IL_0131:  ldflda     valuetype [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder Program/'<Recur>d__2'::'<>t__builder'
    IL_0136:  ldloc.s    V_6
    IL_0138:  call       instance void [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder::SetException(class [mscorlib]System.Exception)
    IL_013d:  leave.s    IL_0152

    }  // end handler
    IL_013f:  ldarg.0
    IL_0140:  ldc.i4.s   -2
    IL_0142:  stfld      int32 Program/'<Recur>d__2'::'<>1__state'
    IL_0147:  ldarg.0
    IL_0148:  ldflda     valuetype [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder Program/'<Recur>d__2'::'<>t__builder'
    IL_014d:  call       instance void [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder::SetResult()
    IL_0152:  ret
} // end of method '<Recur>d__2'::MoveNext

If we instead tweak it to be more like this, moving the call to SetException to be outside of the catch block:

    }  // end .try
    catch [mscorlib]System.Exception 
    {
    IL_0126:  stloc.s    V_6
    IL_0128:  ldarg.0
    IL_0129:  ldc.i4.s   -2
    IL_012b:  stfld      int32 Program/'<Recur>d__2'::'<>1__state'
    IL_013d:  leave.s    IL_0130
    }  // end handler

    IL_0130:  ldarg.0
    IL_0131:  ldflda     valuetype [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder Program/'<Recur>d__2'::'<>t__builder'
    IL_0136:  ldloc.s    V_6
    IL_0138:  call       instance void [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder::SetException(class [mscorlib]System.Exception)
	      br.s       IL_0152

    IL_013f:  ldarg.0
    IL_0140:  ldc.i4.s   -2
    IL_0142:  stfld      int32 Program/'<Recur>d__2'::'<>1__state'
    IL_0147:  ldarg.0
    IL_0148:  ldflda     valuetype [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder Program/'<Recur>d__2'::'<>t__builder'
    IL_014d:  call       instance void [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder::SetResult()
    IL_0152:  ret
} // end of method '<Recur>d__2'::MoveNext

then when running it I get the following significantly reduced output:

Frame size: 976

This would allow for significantly deeper exceptional call stacks, while not impacting the performance of success cases and with only a minor change to the generated IL.

There is a potential downside here, though, which may need additional consideration. On runtimes that support thread aborts, a thread abort could end up in the catch block and then be automatically rethrown; in the current implementation, that'll end up completing the task, whereas in this new approach it wouldn't. That said, if thread aborts are occurring, they could also occur inside the catch block and prevent even the current code from completing the task.

(As an aside, Task.ContinueWith has logic to force continuations to be invoked asynchronously when it detects it’s too deep on the call stack. We’ll look at adding similar logic in for await, but that would only help with new runtimes that incorporate the change, whereas this minor code gen change can benefit all implementations, regardless of runtime.)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Misc

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions