JIT: Add runtime async transformation #114861

jakobbotsch · 2025-04-21T11:57:53Z

Add the JIT parts of runtime async. This introduces a new transformation that runs before lowering and that transforms the function into a state machine, with states set up to allow suspension and resumption at async calls (the suspension points).

Suspension is indicated by the callee returning a continuation object using a new calling convention. When suspension happens, the caller similarly suspends by capturing all live state and saving it in a continuation object. These continuation objects are then linked together and continue to be propagated back to the callers until we finally get to a caller that is not async anymore (i.e. that expects to see a Task/Task<T>). The VM synthesizes a Task/Task<T> wrapper for the async functions that hide away the management the continuation objects. See #114675 for more details around this and around the calling convention.

The continuation objects can later be used to resume the function at the point where suspension happened. This is accomplished by async functions also taking a continuation object as a parameter. When such a parameter is passed (i.e. it is non-null), the JIT will restore all live state from the continuation and resume from the correct location. The continuations store a state number so that resumption can know where to resume.

The full resumption process also involves an IL stub called the async resumption stub. This stub is responsible for calling the async function with the right stack frame setup for arguments and simultaneously passing a non-null continuation. The stack frame setup done by the IL async resumption stub is important as the JIT uses this space to restore live parameters from the continuation.

Continuations similarly support propagation of the return values from the callee and of potential exceptions thrown by the callee. Return values are stored in a known location in the continuation object, and the async resumption stubs are responsible for propagating these values into the next continuation when suspension/resumption has happened. The JIT's resumption code will fetch the return value from the known location and copy it to the right place in the caller. Similarly, exceptions are kept in a known place and are handled by being rethrown from the right location when present.

OSR functions come with complications. Since they rely on frame setup done by the corresponding tier-0 method the resumption of these methods needs to happen through the tier 0 method. When OSR methods suspend they store their IL offset in the continuation object, while tier 0 methods with patchpoints will store -1 in the continuation object. The VM then always resumes these methods in the tier 0 method, which knows to use the IL offset to determine whether resumption should happen in the tier 0 method or whether control needs to continue into an OSR method.

dotnet-policy-service · 2025-04-21T11:58:52Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

src/coreclr/jit/importercalls.cpp

src/coreclr/jit/namedintrinsiclist.h

VSadov · 2025-04-21T23:51:57Z

src/coreclr/jit/importer.cpp

+//    true if this is an Await that we can optimize
+//
+bool Compiler::impMatchAwaitPattern(const BYTE* codeAddr, const BYTE* codeEndp, int* configVal)
+{


Here we detect the pattern where one Async method awaits another and turn it into async call.

T result = AsyncHelpers.Await<T>(AsyncMehtod()); becomes // call Async variant, skip wrapping/unwrapping the result through `Task<T>` T result = async_call RtAsyncMehtod();

(added a comment to reference from another PR)

jakobbotsch · 2025-04-23T22:48:54Z

cc @dotnet/jit-contrib

No diffs beyond some minor TP diffs.

All JIT-EE changes are stubbed out in this PR to validate no diffs. The implementations will come with #114675, but if this PR merges before that one I will submit a follow-up that updates the JIT-EE interface and moves the stubs to the EE side.

There is a high-level overview of the transformation written above and I have tried hard to break the transformation itself into smaller pieces that are hopefully easier to understand.

AndyAyersMS

Overall this looks good, I didn't see anything that looked like it needed addressing before merging.

For OSR methods you might make it clearer that the IL offset is the offset that inspires the OSR method, not the IL offset of any suspension point; I was confused by this for a while.

You mentioned somewhere that you needed to disable some or all of escape analysis to prevent creation of byrefs...? I didn't see that here.

AndyAyersMS · 2025-04-29T15:10:32Z

src/coreclr/jit/lsraxarch.cpp

@@ -1358,6 +1363,11 @@ int LinearScan::BuildCall(GenTreeCall* call)
    buildInternalRegisterUses();

    // Now generate defs and kills.
+    if (call->IsAsync() && compiler->compIsAsync() && !call->IsFastTailCall())


Is there something written up about async tall calls work (or don't)?

Nothing explicit, but the support mostly just falls out with the non-standard calling convention.

I assume only Async->Async kind of call can be tail-called?

Yes, there is an added check for that somewhere in morph.cpp.

AndyAyersMS · 2025-04-29T15:21:44Z

src/coreclr/jit/optcse.cpp

+        return;
+    }
+
+    for (BasicBlock* block : Blocks())


If we are killing off all CSEs at each async call why do we need this per-block scanning? Shouldn't the right availability just fall out from the async call kills?

I think that the two computations are both needed:

This part is computing the correct CseGen/CseOut set for each basic block. A block with an async call may generate a byref CSE, but only if that byref CSE is after the async call in execution order. With each of those sets we compute the correct in/out CSE sets for every basic block with a data flow afterwards.

Later we also iterate the IR and want to track all available CSEs at all points in the IR. For that we start with with the CseIn sets computed by this data flow at the beginning of each basic block, and then we track kills/defs manually while going through the IR. That part once again needs to have the correct killing/generation logic (with a higher level of detail).

AndyAyersMS · 2025-04-29T15:24:28Z

src/coreclr/jit/lower.cpp

+
+    GenTree* next = asyncCont->gtNext;
+
+    // When the ASYNC_CONTINUATION was created as a result of the


Can you explain what this is doing a bit more?

Async resumption stubs are generated in the same way as instantiating stubs by the VM: they are normal IL stubs that use a calli instruction to call the target, and the target has a non-standard calling convention. For runtime async functions the target not only has an extra parameter, it also has an extra GC return value.

The extra GC return can be accessed by the IL code with the AsyncHelpers.AsyncCallContinuation() intrinsic mentioned here. It's used by the async infrastructure to know whether the resumed function suspended again, or whether it finished running and its result needs to be propagated.

The problem without the logic here is that the JIT does not know that the function has the additional GC return value, so it does not set up GC tracking properly. The backend knows about the extra return value based on GTF_CALL_M_ASYNC set on the call node. So the logic here uses the presence of AsyncHelpers.AsyncCallContinuation to make the determination.

This is quite hacky, once again boiling down to our representational difficulties around nodes defining multiple values. My long term ambition in this area is to unify it with the representation that physical promotion eventually will use to define structs returned by calls in multiple registers, and then replace the intrinsic by an intrinsic type instead. So that for instance the IL code would be represented as something like:

ValueWithReturnedContinuation<T> result = calli(); Continuation continuation = result.Continuation; T value = result.Value;

instead of the current

T result = calli(); Continuation continuation = AsyncHelpers.AsyncCallContinuation();

The former should be less fragile than the latter (which we can accidentally split up or handled incorrectly), but today will spill to stack in a number of cases.

I will expand on the comment.

AndyAyersMS · 2025-04-29T15:24:52Z

src/coreclr/jit/block.h

@@ -480,15 +481,15 @@ enum BasicBlockFlags : uint64_t
    // For example, the top block might or might not have BBF_GC_SAFE_POINT,
    // but we assume it does not have BBF_GC_SAFE_POINT any more.

-    BBF_SPLIT_LOST = BBF_GC_SAFE_POINT | BBF_NEEDS_GCPOLL | BBF_HAS_JMP | BBF_KEEP_BBJ_ALWAYS | BBF_CLONED_FINALLY_END | BBF_RECURSIVE_TAILCALL,
+    BBF_SPLIT_LOST = BBF_GC_SAFE_POINT | BBF_NEEDS_GCPOLL | BBF_HAS_JMP | BBF_KEEP_BBJ_ALWAYS | BBF_CLONED_FINALLY_END | BBF_RECURSIVE_TAILCALL | BBF_ASYNC_RESUMPTION,


Seems a bit odd to have BBF_ASYNC_RESUMPTION in split/lost like this? Is there some convention you're following where a block must have just one async call and it's at the end?

Yes, this seems wrong... This should only be in the split gained flags (the flag is an overapproximation, just to silence the "jump into try region" assert).

AndyAyersMS · 2025-04-29T16:05:09Z

src/coreclr/jit/async.cpp

+                }
+                else
+                {
+                    inf.DataSize = layout->GetSize();


So for mixed gc/non-gc structs we'll still have space for the gc parts (presumably zeroed) in the nongc-data?

Correct. That's probably something we could try optimizing, but likely in the long term we'll generate a data type that stores structs in their natural format anyway.

AndyAyersMS · 2025-04-29T16:06:45Z

src/coreclr/jit/async.cpp

+            }
+            else
+            {
+                inf.Alignment = m_comp->info.compCompHnd->getClassAlignmentRequirement(layout->GetClassHandle());


Does alignment actually matter here?

Only for arm32 doubles/floats, probably. We could mark the loads in the restore path as unaligned instead.

AndyAyersMS · 2025-04-29T16:19:48Z

src/coreclr/jit/async.cpp

+    JITDUMP("  Creating resumption " FMT_BB " for state %u\n", resumeBB->bbNum, stateNum);
+
+    unsigned resumeByteArrLclNum = BAD_VAR_NUM;
+    if (layout.DataSize > 0)


maybe leave a note that we need to restore non gc data first and gc data afterwards, so the gc data doesnt' get overwritten?

jakobbotsch · 2025-04-29T20:25:29Z

You mentioned somewhere that you needed to disable some or all of escape analysis to prevent creation of byrefs...? I didn't see that here.

It's disabled here:

https://github.com/jakobbotsch/runtime/blob/96350be611cf40de5661571a5cf17b71aa541207/src/coreclr/jit/compiler.h#L10903-L10913

kunalspathak · 2025-04-29T23:31:04Z

src/coreclr/jit/lsrabuild.cpp

@@ -686,7 +686,7 @@ bool LinearScan::isContainableMemoryOp(GenTree* node)
 //    mask        - the mask (set) of registers.
 //    currentLoc  - the location at which they should be added
 //


nit: Comment about returning the RefTypeKill refposition.

Will add that as part of a follow-up.

kunalspathak · 2025-04-29T23:42:27Z

src/coreclr/jit/jitconfigvalues.h

@@ -594,6 +594,8 @@ OPT_CONFIG_INTEGER(JitDoIfConversion, "JitDoIfConversion", 1)
 OPT_CONFIG_INTEGER(JitDoOptimizeMaskConversions, "JitDoOptimizeMaskConversions", 1) // Perform optimization of mask
                                                                                    // conversions

+RELEASE_CONFIG_INTEGER(JitOptimizeAwait, "JitOptimizeAwait", 1) // Perform optimization of Await intrinsics


I am assuming this will be OFF by default?

This is ON by default. This is what allows to bypass materialization of Task when one Async method awaits another. In production it is supposed to be always on.

The optimization is in theory optional though, and disabling it has revealed bugs in the past. It can also be useful for measuring the performance impact while developing, or narrowing issues specific to this optimization.

The plan is to eventually remove the knob or make it Checked.
Tracked in dotnet/runtimelab#3012

(need to port the issue to the main repo once code moves, applies to other active/tracking Async issues in runtimelab as well)

If potential regressing of existing scenarios is a concern - this knob/optimization will only have effect inside Async methods. If no Async methods, then no effect, so it is safe to leave on ON.

if (compIsAsync() && JitConfig.JitOptimizeAwait()) { isAwait = impMatchAwaitPattern(codeAddr, codeEndp, &configVal); } . . .

kunalspathak · 2025-04-29T23:44:48Z

I went through lsra, codegen and emitter changes and they look good to me.

jakobbotsch added 8 commits April 21, 2025 12:50

JIT changes

e8dee09

Stub out JIT-EE changes

075c19d

Replace async marker with flag

4cec8cb

Fix a todo

b98ce25

Undo unnecessary change

717fbd1

Undo some JIT-EE changes

65590e5

Add high level description of the transformation

5a67ad0

Sort async file correctly

237bb4b

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 21, 2025

dotnet-policy-service bot assigned jakobbotsch Apr 21, 2025

Run jit-format

fcba279

am11 added the runtime-async label Apr 21, 2025

Undo a change

ae82cba

VSadov reviewed Apr 21, 2025

View reviewed changes

src/coreclr/jit/importercalls.cpp Outdated Show resolved Hide resolved

VSadov reviewed Apr 21, 2025

View reviewed changes

src/coreclr/jit/namedintrinsiclist.h Outdated Show resolved Hide resolved

VSadov reviewed Apr 21, 2025

View reviewed changes

VSadov mentioned this pull request Apr 22, 2025

[RuntimeAsync] Merge from labs, all except JIT #114675

Merged

jakobbotsch mentioned this pull request Apr 22, 2025

JIT: Avoid overriding argument register with 'jmp' epilog #114899

Merged

jakobbotsch added 4 commits April 23, 2025 11:37

Merge branch 'main' of github.com:dotnet/runtime into async2-jit

afac563

Remove PSPSym reference

0fe94e6

Delete old intrinsic

f383273

Remove unnecessary CallArgs change

96350be

jakobbotsch marked this pull request as ready for review April 23, 2025 22:10

build-analysis bot mentioned this pull request Apr 23, 2025

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

VSadov mentioned this pull request Apr 28, 2025

[RuntimeAsync] Move runtime async to the main repo #113976

Closed

33 tasks

AndyAyersMS approved these changes Apr 29, 2025

View reviewed changes

jakobbotsch added 2 commits April 29, 2025 22:24

Merge branch 'main' of github.com:dotnet/runtime into async2-jit

5dff5b0

Clean up after merge

a383044

jakobbotsch added 7 commits April 29, 2025 22:27

Clarify IL offset stored

ce2c4f1

Address feedback

a94387e

More clean up

f5479b0

Reword comment

6f1f369

Remove another TODO

273e936

Remove unused function

414e2c8

Undo a change

f292350

kunalspathak reviewed Apr 29, 2025

View reviewed changes

build-analysis bot mentioned this pull request Apr 30, 2025

System.Security.Cryptography.X509Certificates.Tests.PfxTests.ReadMLKem512PrivateKey_NotSupported failing with CryptographicException #115156

Open

jakobbotsch merged commit bb6b09a into dotnet:main Apr 30, 2025
110 of 113 checks passed

jakobbotsch deleted the async2-jit branch April 30, 2025 10:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Add runtime async transformation #114861

JIT: Add runtime async transformation #114861

jakobbotsch commented Apr 21, 2025 •

edited by BruceForstall

Loading

dotnet-policy-service bot commented Apr 21, 2025

VSadov Apr 21, 2025 •

edited

Loading

jakobbotsch commented Apr 23, 2025

AndyAyersMS left a comment

AndyAyersMS Apr 29, 2025

jakobbotsch Apr 29, 2025

VSadov Apr 29, 2025

jakobbotsch Apr 29, 2025

AndyAyersMS Apr 29, 2025

jakobbotsch Apr 29, 2025

AndyAyersMS Apr 29, 2025

jakobbotsch Apr 29, 2025 •

edited

Loading

AndyAyersMS Apr 29, 2025

jakobbotsch Apr 29, 2025

AndyAyersMS Apr 29, 2025

jakobbotsch Apr 29, 2025

AndyAyersMS Apr 29, 2025

jakobbotsch Apr 29, 2025

AndyAyersMS Apr 29, 2025

jakobbotsch Apr 29, 2025

jakobbotsch commented Apr 29, 2025

kunalspathak Apr 29, 2025

jakobbotsch Apr 30, 2025

kunalspathak Apr 29, 2025

VSadov Apr 29, 2025 •

edited

Loading

VSadov Apr 30, 2025 •

edited

Loading

kunalspathak commented Apr 29, 2025


		GenTree* next = asyncCont->gtNext;

		// When the ASYNC_CONTINUATION was created as a result of the

JIT: Add runtime async transformation #114861

JIT: Add runtime async transformation #114861

Conversation

jakobbotsch commented Apr 21, 2025 • edited by BruceForstall Loading

dotnet-policy-service bot commented Apr 21, 2025

VSadov Apr 21, 2025 • edited Loading

Choose a reason for hiding this comment

jakobbotsch commented Apr 23, 2025

AndyAyersMS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakobbotsch Apr 29, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakobbotsch commented Apr 29, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VSadov Apr 29, 2025 • edited Loading

Choose a reason for hiding this comment

VSadov Apr 30, 2025 • edited Loading

Choose a reason for hiding this comment

kunalspathak commented Apr 29, 2025

jakobbotsch commented Apr 21, 2025 •

edited by BruceForstall

Loading

VSadov Apr 21, 2025 •

edited

Loading

jakobbotsch Apr 29, 2025 •

edited

Loading

VSadov Apr 29, 2025 •

edited

Loading

VSadov Apr 30, 2025 •

edited

Loading