[Coroutines] Mark parameter allocas with coro.outside.frame metadata #127653

zmodem · 2025-02-18T15:29:49Z

Parameters to a coroutine get copied (moved) to coroutine-local instances which code inside the coroutine then uses.

The original parameters should not be part of the frame. Normally CoroSplit figures that out by itself, but for [[clang::trivial_abi]] parameters which, get destructed at the end of the ramp function, it does not (see bug), causing use-after-free's if the frame is destroyed before the end of the ramp (as happens if it doesn't suspend).

Since Clang knows these should never be part of the frame, use metadata to make it so.

Fixes #127499

Parameters to a coroutine get copied (moved) to coroutine-local instances which code inside the coroutine then uses. The original parameters should not be part of the frame. Normally CoroSplit figures that out by itself, but for [[clang::trivial_abi]] parameters which, get destructed at the end of the ramp function, it does not (see bug), causing use-after-free's if the frame is destroyed before the end of the ramp (as happens if it doesn't suspend). Since Clang knows these should never be part of the frame, use metadata to make it so. Fixes llvm#127499

llvmbot · 2025-02-18T15:30:32Z

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-clang
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-coroutines

Author: Hans Wennborg (zmodem)

Changes

Parameters to a coroutine get copied (moved) to coroutine-local instances which code inside the coroutine then uses.

The original parameters should not be part of the frame. Normally CoroSplit figures that out by itself, but for [[clang::trivial_abi]] parameters which, get destructed at the end of the ramp function, it does not (see bug), causing use-after-free's if the frame is destroyed before the end of the ramp (as happens if it doesn't suspend).

Since Clang knows these should never be part of the frame, use metadata to make it so.

Fixes #127499

Full diff: https://github.com/llvm/llvm-project/pull/127653.diff

2 Files Affected:

(modified) clang/lib/CodeGen/CGCoroutine.cpp (+10)
(modified) clang/test/CodeGenCoroutines/coro-params.cpp (+30-9)

diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index 9abf2e8c9190d..cdc61676524b4 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -855,6 +855,16 @@ void CodeGenFunction::EmitCoroutineBody(const CoroutineBodyStmt &S) {
     // Create parameter copies. We do it before creating a promise, since an
     // evolution of coroutine TS may allow promise constructor to observe
     // parameter copies.
+    for (const ParmVarDecl *Parm : FnArgs) {
+      // If the original param is in an alloca, exclude it from the coroutine
+      // frame. The parameter copy will be part of the frame.
+      Address ParmAddr = GetAddrOfLocalVar(Parm);
+      if (auto *ParmAlloca =
+              dyn_cast<llvm::AllocaInst>(ParmAddr.getBasePointer())) {
+        ParmAlloca->setMetadata(llvm::LLVMContext::MD_coro_outside_frame,
+                                llvm::MDNode::get(CGM.getLLVMContext(), {}));
+      }
+    }
     for (auto *PM : S.getParamMoves()) {
       EmitStmt(PM);
       ParamReplacer.addCopy(cast<DeclStmt>(PM));
diff --git a/clang/test/CodeGenCoroutines/coro-params.cpp b/clang/test/CodeGenCoroutines/coro-params.cpp
index b318f2f52ac09..9e640fb2c5c62 100644
--- a/clang/test/CodeGenCoroutines/coro-params.cpp
+++ b/clang/test/CodeGenCoroutines/coro-params.cpp
@@ -59,13 +59,22 @@ struct MoveAndCopy {
   ~MoveAndCopy();
 };
 
-void consume(int,int,int) noexcept;
+struct [[clang::trivial_abi]] TrivialABI {
+  int val;
+  TrivialABI(MoveAndCopy&&) noexcept;
+  ~TrivialABI();
+};
+
+void consume(int,int,int,int) noexcept;
 
 // TODO: Add support for CopyOnly params
-// CHECK: define{{.*}} void @_Z1fi8MoveOnly11MoveAndCopy(i32 noundef %val, ptr noundef %[[MoParam:.+]], ptr noundef %[[McParam:.+]]) #0 personality ptr @__gxx_personality_v0
-void f(int val, MoveOnly moParam, MoveAndCopy mcParam) {
+// CHECK: define{{.*}} void @_Z1fi8MoveOnly11MoveAndCopy10TrivialABI(i32 noundef %val, ptr noundef %[[MoParam:.+]], ptr noundef %[[McParam:.+]], i32 %[[TrivialParam:.+]]) #0 personality ptr @__gxx_personality_v0
+void f(int val, MoveOnly moParam, MoveAndCopy mcParam, TrivialABI trivialParam) {
+  // CHECK: %[[TrivialAlloca:.+]] = alloca %struct.TrivialABI,
+  // CHECK-SAME: !coro.outside.frame
   // CHECK: %[[MoCopy:.+]] = alloca %struct.MoveOnly,
   // CHECK: %[[McCopy:.+]] = alloca %struct.MoveAndCopy,
+  // CHECK: %[[TrivialCopy:.+]] = alloca %struct.TrivialABI,
   // CHECK: store i32 %val, ptr %[[ValAddr:.+]]
 
   // CHECK: call ptr @llvm.coro.begin(
@@ -73,25 +82,33 @@ void f(int val, MoveOnly moParam, MoveAndCopy mcParam) {
   // CHECK-NEXT: call void @llvm.lifetime.start.p0(
   // CHECK-NEXT: call void @_ZN11MoveAndCopyC1EOS_(ptr {{[^,]*}} %[[McCopy]], ptr noundef nonnull align 4 dereferenceable(4) %[[McParam]]) #
   // CHECK-NEXT: call void @llvm.lifetime.start.p0(
-  // CHECK-NEXT: invoke void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopyEE12promise_typeC1Ev(
+  // CHECK-NEXT: call void @llvm.memcpy
+  // CHECK-SAME: %[[TrivialCopy]]
+  // CHECK-SAME: %[[TrivialAlloca]]
+  // CHECK-NEXT: call void @llvm.lifetime.start.p0(
+  // CHECK-NEXT: invoke void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopy10TrivialABIEE12promise_typeC1Ev(
 
   // CHECK: call void @_ZN14suspend_always12await_resumeEv(
   // CHECK: %[[IntParam:.+]] = load i32, ptr %{{.*}}
   // CHECK: %[[MoGep:.+]] = getelementptr inbounds nuw %struct.MoveOnly, ptr %[[MoCopy]], i32 0, i32 0
   // CHECK: %[[MoVal:.+]] = load i32, ptr %[[MoGep]]
-  // CHECK: %[[McGep:.+]] =  getelementptr inbounds nuw %struct.MoveAndCopy, ptr %[[McCopy]], i32 0, i32 0
+  // CHECK: %[[McGep:.+]] = getelementptr inbounds nuw %struct.MoveAndCopy, ptr %[[McCopy]], i32 0, i32 0
   // CHECK: %[[McVal:.+]] = load i32, ptr %[[McGep]]
-  // CHECK: call void @_Z7consumeiii(i32 noundef %[[IntParam]], i32 noundef %[[MoVal]], i32 noundef %[[McVal]])
+  // CHECK: %[[TrivialGep:.+]] = getelementptr inbounds nuw %struct.TrivialABI, ptr %[[TrivialCopy]], i32 0, i32 0
+  // CHECK: %[[TrivialVal:.+]] = load i32, ptr %[[TrivialGep]]
+  // CHECK: call void @_Z7consumeiiii(i32 noundef %[[IntParam]], i32 noundef %[[MoVal]], i32 noundef %[[McVal]], i32 noundef %[[TrivialVal]])
 
-  consume(val, moParam.val, mcParam.val);
+  consume(val, moParam.val, mcParam.val, trivialParam.val);
   co_return;
 
   // Skip to final suspend:
-  // CHECK: call void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopyEE12promise_type13final_suspendEv(
+  // CHECK: call void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopy10TrivialABIEE12promise_type13final_suspendEv(
   // CHECK: call void @_ZN14suspend_always12await_resumeEv(
 
   // Destroy promise, then parameter copies:
-  // CHECK: call void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopyEE12promise_typeD1Ev(ptr {{[^,]*}} %__promise)
+  // CHECK: call void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopy10TrivialABIEE12promise_typeD1Ev(ptr {{[^,]*}} %__promise)
+  // CHECK-NEXT: call void @llvm.lifetime.end.p0(
+  // CHECK-NEXT: call void @_ZN10TrivialABID1Ev(ptr {{[^,]*}} %[[TrivialCopy]])
   // CHECK-NEXT: call void @llvm.lifetime.end.p0(
   // CHECK-NEXT: call void @_ZN11MoveAndCopyD1Ev(ptr {{[^,]*}} %[[McCopy]])
   // CHECK-NEXT: call void @llvm.lifetime.end.p0(
@@ -99,6 +116,10 @@ void f(int val, MoveOnly moParam, MoveAndCopy mcParam) {
   // CHECK-NEXT: call void @llvm.lifetime.end.p0(
   // CHECK-NEXT: call void @llvm.lifetime.end.p0(
   // CHECK-NEXT: call ptr @llvm.coro.free(
+
+  // The original trivial_abi parameter is destroyed when returning from the ramp.
+  // CHECK: call i1 @llvm.coro.end
+  // CHECK: call void @_ZN10TrivialABID1Ev(ptr {{[^,]*}} %[[TrivialAlloca]])
 }
 
 // CHECK-LABEL: void @_Z16dependent_paramsI1A1BEvT_T0_S3_(ptr noundef %x, ptr noundef %0, ptr noundef %y)

zmodem · 2025-02-18T15:31:04Z

I think this is a better fix than #127524. Please take a look.

efriedma-quic · 2025-02-18T21:42:09Z

Adding metadata to an instruction should never be required for correctness: optimizations are not required to preserve metadata on instructions, so this "fix" will just break again later.

If we need two different kinds of alloca in coroutine functions, we need a different way to identify them.

rnk · 2025-02-19T00:38:13Z

That may be true, but grepping suggests that coro_outside_frame is pre-existing load-bearing metadata. I think it would be reasonable to use it to fix correctness today and work towards the longer term goal of representing important semantic information as first class alloca fields. We already have a significant semantics changing property, inalloca, that we could model this after.

efriedma-quic · 2025-02-19T01:10:42Z

I'm not really happy with that... but if the semantics were never properly defined in the first place, I guess this isn't making things worse.

Not sure if marking the allocas themselves is actually the right approach long-term. It seems like there's a distinction between code that's "inside" the coroutine (between the begin/end) and "outside" the coroutine, and there are a bunch of restrictions on code that's "outside" the coroutine. I'm not sure I really understand the mechanics of that, though.

ChuanqiXu9 · 2025-02-19T01:40:11Z

I don't feel the previous approach is too problematic. And for this one, I agree with Eli that the metadate may get erased somehow during the transformation.

zmodem · 2025-02-19T08:27:21Z

Adding metadata to an instruction should never be required for correctness

I figured the existing use and the fact that it only needs to survive until CoroSplit made it good enough. But you're right, we should do better.

I do think we need an explicit way to tell CoroSplit whether an alloca should go in the frame or not. I was thinking of adding an intrinsic when I found this metadata.

How about turning coro.outside.frame into an intrinsic instead?

I don't feel the previous approach is too problematic.

I was concerned that it relied on semantics which aren't really defined. There is nothing in the langref which says that the coro frame cannot be accessed after coro.end. As far as I can tell, doing so might be fine as long as the frame hasn't been deallocated, which we can't infer statically.

In fact, my change broke a test which seems to be doing exactly that. Just because it's old doesn't mean it was wrong. So the patch didn't feel solid.

On the other hand, the frontend knows exactly what's going on with these parameter allocas: they should be outside the frame, so having a way to communicate that seems like a solid fix.

zmodem · 2025-02-20T15:53:44Z

I pushed a sketch of what using an intrinsic instead would look like. What do you think?

github-actions · 2025-02-20T15:56:46Z

✅ With the latest revision this PR passed the C/C++ code formatter.

rnk

In the issue (#127499) you pointed out that this issue applies to the MSVC ABI where all parameters are destroyed in the callee. Is it reasonable to construct the analogous test case in that environment? I tried to do this locally, but can't for unrelated build configuration reasons.

ChuanqiXu9 · 2025-02-21T02:10:11Z

Adding metadata to an instruction should never be required for correctness

I figured the existing use and the fact that it only needs to survive until CoroSplit made it good enough. But you're right, we should do better.

I do think we need an explicit way to tell CoroSplit whether an alloca should go in the frame or not. I was thinking of adding an intrinsic when I found this metadata.

How about turning coro.outside.frame into an intrinsic instead?

My instinct reaction is, it may block some optimizations to optimize the alloca out. I don't feel very good about it. Is it possible to introduce the intrinsics without blocking optimizations?

I don't feel the previous approach is too problematic.

I was concerned that it relied on semantics which aren't really defined. There is nothing in the langref which says that the coro frame cannot be accessed after coro.end. As far as I can tell, doing so might be fine as long as the frame hasn't been deallocated, which we can't infer statically.

I believe we shouldn't access the coroutine frame after coro.end. If the doc doesn't make it clear, let's try to make it.

It is true that this may be fine with an optimization, but we don't need to care about that.

In fact, my change broke a test which seems to be doing exactly that. Just because it's old doesn't mean it was wrong. So the patch didn't feel solid.

I do feel it is wrong since it accesses the frame after it ends. It makes no sense.

On the other hand, the frontend knows exactly what's going on with these parameter allocas: they should be outside the frame, so having a way to communicate that seems like a solid fix.

zmodem · 2025-02-21T12:11:36Z

In the issue (#127499) you pointed out that this issue applies to the MSVC ABI where all parameters are destroyed in the callee. Is it reasonable to construct the analogous test case in that environment?

Yes, in fact the reproducer on that issue will trigger use-after-free on x86-64 Windows also without the trivial_abi attribute.

I've updated coro-params.cpp to cover this as well.

My instinct reaction is, it may block some optimizations to optimize the alloca out.

The intrinsic gets dropped by CoroSplit, so after that it's not a problem. Before that, I'm not sure much interesting optimization can happen to allocas anyway? Especially since it's not known if they will be part of the coro frame or not at that point.

I believe we shouldn't access the coroutine frame after coro.end. If the doc doesn't make it clear, let's try to make it.

How would we motivate that change though?

Also, would that solve the GRO issue (#49843) which led to adding the existing metadata?

In fact, my change broke a test which seems to be doing exactly that. Just because it's old doesn't mean it was wrong. So the patch didn't feel solid.

I do feel it is wrong since it accesses the frame after it ends. It makes no sense.

(To make it easier to follow, we're talking about this access.)

Looking closer, I think that code might be fine. The code after coro.end will go at the end of the ramp function. At that point the coro frame is still alive. And since that alloca is also accessed in the resume function (after a suspend), it needs to be in the frame.

ChuanqiXu9 · 2025-02-24T04:26:42Z

In the issue (#127499) you pointed out that this issue applies to the MSVC ABI where all parameters are destroyed in the callee. Is it reasonable to construct the analogous test case in that environment?

Yes, in fact the reproducer on that issue will trigger use-after-free on x86-64 Windows also without the trivial_abi attribute.

I've updated coro-params.cpp to cover this as well.

My instinct reaction is, it may block some optimizations to optimize the alloca out.

The intrinsic gets dropped by CoroSplit, so after that it's not a problem. Before that, I'm not sure much interesting optimization can happen to allocas anyway? Especially since it's not known if they will be part of the coro frame or not at that point.

As far as I know, a lot of allocas related optimizations/transformation happens before CoroSplit. e.g., mem2reg. And this is the reason why we put CoroSplit in the middle of passes.

I believe we shouldn't access the coroutine frame after coro.end. If the doc doesn't make it clear, let's try to make it.

How would we motivate that change though?

Also, would that solve the GRO issue (#49843) which led to adding the existing metadata?

I think so.

In fact, my change broke a test which seems to be doing exactly that. Just because it's old doesn't mean it was wrong. So the patch didn't feel solid.

I do feel it is wrong since it accesses the frame after it ends. It makes no sense.

(To make it easier to follow, we're talking about this access.)

Looking closer, I think that code might be fine. The code after coro.end will go at the end of the ramp function. At that point the coro frame is still alive. And since that alloca is also accessed in the resume function (after a suspend), it needs to be in the frame.

On the one hand, I still suspect to the validness of the old test. On the other hand, I think we should improve our algorithm to identify the area after coro.end as special area when calculating whether or not to put an alloca to the frame.

The short term solution may consider the case the coroutine won't suspend specially.

zmodem · 2025-02-24T13:58:48Z

My instinct reaction is, it may block some optimizations to optimize the alloca out.

The intrinsic gets dropped by CoroSplit, so after that it's not a problem. Before that, I'm not sure much interesting optimization can happen to allocas anyway? Especially since it's not known if they will be part of the coro frame or not at that point.

As far as I know, a lot of allocas related optimizations/transformation happens before CoroSplit. e.g., mem2reg. And this is the reason why we put CoroSplit in the middle of passes.

You're right. I hadn't realized that we run mem2reg before CoroSplit and then spill values back to the frame as needed.

I suppose mem2reg is especially relevant since it operates on allocas. If the transform could remove an alloca, we should not let the intrinsic block it. I've updated the PR to handle that.

For other optimizations that analyze uses of the alloca, I've marked the intrinsic memory(none) and the argument nocapture.

I think this way the intrinsic will not block important optimizations.

In fact, my change broke a test which seems to be doing exactly that. Just because it's old doesn't mean it was wrong. So the patch didn't feel solid.

I do feel it is wrong since it accesses the frame after it ends. It makes no sense.

(To make it easier to follow, we're talking about this access.)
Looking closer, I think that code might be fine. The code after coro.end will go at the end of the ramp function. At that point the coro frame is still alive. And since that alloca is also accessed in the resume function (after a suspend), it needs to be in the frame.

On the one hand, I still suspect to the validness of the old test. On the other hand, I think we should improve our algorithm to identify the area after coro.end as special area when calculating whether or not to put an alloca to the frame.

I think until someone points out exactly what's wrong about the old test, we have to assume that it's valid.

The short term solution may consider the case the coroutine won't suspend specially.

Can you explain this more?

efriedma-quic · 2025-02-24T18:56:18Z

I think if we're trying to automatically identify whether an alloca has one of two lifetimes, and one of those lifetimes isn't a subset of the other, we're guaranteed to run into trouble again, eventually. We have to pick the right lifetime 100% of the time... but without assistance from the frontend, it's impossible to write an algorithm that doesn't give up in some cases, given a pointer that escapes. So it's not really a question of "improving" the existing algorithm; we need markers in the IR, like coro_outside_frame, and the algorithm should be based on that.

rnk · 2025-02-24T19:21:17Z

Re: sroa/mem2reg, that's a valid concern with Hans's intrinsic approach.

So it's not really a question of "improving" the existing algorithm; we need markers in the IR, like coro_outside_frame, and the algorithm should be based on that.

To check my understanding, by explicit markers, you mean first class fields of AllocaInst, like the inalloca bit, right?

Going back to the beginning, why do we end up in a UAF situation? IIUC, the ramp function should do the following:

store the bytes of the trivial abi argument into an alloca
call the move ctor into the coro frame
co_return
destroy the coro frame objects, matching the move ctor call
destroy the alloca, matching the construction from earlier in the caller

What is coro split doing that breaks this? It sounds like it's rewriting the second destructor to destroy the variable in the frame, but that's incorrect.

I guess the issue is that the coroutine frame is not modelled in IR during frontend lowering. It's implied that all allocas get moved into the frame, except in two cases, apparently.

What's different between the case where we suspend and the case where there are no suspend points?

zmodem · 2025-02-25T10:54:28Z

Re: sroa/mem2reg, that's a valid concern with Hans's intrinsic approach.

Is it still a concern in the latest version, which makes mem2reg aware of the intrinsic?

I don't have a good feel for whether putting a flag on AllocaInst or using an intrinsic is less disruptive. Happy to take input on that one.

Going back to the beginning, why do we end up in a UAF situation? IIUC, the ramp function should do the following:

store the bytes of the trivial abi argument into an alloca

call the move ctor into the coro frame

co_return

destroy the coro frame objects, matching the move ctor call

destroy the alloca, matching the construction from earlier in the caller

What is coro split doing that breaks this? It sounds like it's rewriting the second destructor to destroy the variable in the frame, but that's incorrect.

The problem is that the "parameter alloca" (first bullet point) incorrectly gets "promoted" to the coroutine frame. CoroSplit assumes it may be accessed across suspension points (I think it has to, as the address escapes via the "src" argument in the move ctor call).

What's different between the case where we suspend and the case where there are no suspend points?

The parameter's destructor runs at the end of the ramp function (last bullet point). If we're suspending, the coro frame is still alive at this point (for use when resuming). If the coro finished without suspending, the coro frame has been destroyed at this point, and we get UAF.

zmodem · 2025-02-27T13:24:00Z

The discussion seems to be stalling. I'll try to summarize.

There seem to be two major questions:

Whether the front-end should explicitly mark some allocas as not belonging in the coroutine frame, or whether we should enhance CoroSplit to infer it automatically.

I think explicitly marking the allocas is the best solution for the current bug ([coro] Use-after-free of callee-destructed parameter in coroutine which doesn't suspend #127499), as well as the previous bug (Coroutine use after free on Clang 12. #49843) which already adopted this approach.

Based on @efriedma-quic's most recent comment, I think he's also in favor of this approach(?):

So it's not really a question of "improving" the existing algorithm; we need markers in the IR, like coro_outside_frame, and the algorithm should be based on that.

I don't think @rnk expressed an opinion on this question yet.

@ChuanqiXu9 has argued in favor of the approach from [Coroutines] Allocas used after @coro.end should not go in the coro frame #127524 instead: making CoroSplit treat allocas referenced after the coro.end intrinsic as belonging outside the frame.
If we do use explicit markers for the allocas, how should that be done?

Currently, coro.outside.frame metadata on the allocas is used for this.

@efriedma-quic pointed out metadata must never be required for correctness, and we all agree this current state is bad.

I've proposed using an intrinsic.

@rnk has suggested that we could put an explicit field on the alloca instruction, similar to inalloca.

@ChuanqiXu9 has raised concerns about an intrinsic blocking optimization.

I'm keen to unbreak our users here, as well as implementing the proper long-term solution. Since there is some disagreement, maybe we shouldn't do it all at once.

I've rolled back my PR to just use the existing metadata. While that's not a complete solution, it should be a strict improvement: it fixes the immediate issue using the existing mechanism. I'd like to go ahead and land this.

I will follow up with a PR to remove the metadata, and we can continue discussing there.

rnk

I think correctness comes first, so this prioritization makes sense.

rnk · 2025-02-28T00:13:33Z

clang/lib/CodeGen/CGCoroutine.cpp

@@ -855,6 +855,16 @@ void CodeGenFunction::EmitCoroutineBody(const CoroutineBodyStmt &S) {
    // Create parameter copies. We do it before creating a promise, since an
    // evolution of coroutine TS may allow promise constructor to observe
    // parameter copies.
+    for (const ParmVarDecl *Parm : FnArgs) {
+      // If the original param is in an alloca, exclude it from the coroutine
+      // frame. The parameter copy will be part of the frame.


Please elaborate in this comment:

// frame. The parameter copy will be part of the frame. The parameter copy // will live on the heap, but the original parameter memory should remain on // the stack. This is necessary to ensure that parameters destroyed in callees, as // with `trivial_abi` or in the MSVC C++ ABI, are appropriately destroyed after // setting up the coroutine.

rnk · 2025-02-28T00:14:51Z

clang/test/CodeGenCoroutines/coro-params.cpp

+
+void consume(int) noexcept;
+
+// Similarly to the [[clang::trivial_abi]] parameters, with the MSVC ABI


Right, this is a good edge case to motivate generalizing this for all parameters.

ChuanqiXu9

Abstractly the current one looks good to stop the bleeding first.

But for the issue itself, I still feel we can fix it in the middle end. Since the case a coroutine without a suspend point is relatively rare case to me.

rkjnsn · 2025-02-28T03:42:09Z

Since the case a coroutine without a suspend point is relatively rare case to me.

For what it's worth, this issue isn't limited to coroutines without any suspend point. It can also occur:

If the coroutine conditionally doesn't await (e.g., if (condition) { co_await foo(); } and condition is false)
If the coroutine awaits, but the awaitor's await_ready() returns true so the coroutine doesn't suspend
If the coroutine suspends, but the awaitor's await_suspend() returns false so the coroutine is immediately resumed
If the coroutine suspends, but the awaitor's await_suspend() causes the coroutine to resume before the ramp function finishes (either by resuming on the same thread or losing a race)
If the coroutine suspends, but the awaitor's await_suspend() returns a coroutine handle whose resumption ultimately results in the current coroutine resuming before the ramp function finishes

ChuanqiXu9 · 2025-02-28T03:47:06Z

Since the case a coroutine without a suspend point is relatively rare case to me.

For what it's worth, this issue isn't limited to coroutines without any suspend point. It can also occur:

If the coroutine conditionally doesn't await (e.g., if (condition) { co_await foo(); } and condition is false)

If the coroutine awaits, but the awaitor's await_ready() returns true so the coroutine doesn't suspend

If the coroutine suspends, but the awaitor's await_suspend() returns false so the coroutine is immediately resumed

If the coroutine suspends, but the awaitor's await_suspend() causes the coroutine to resume before the ramp function finishes (either by resuming on the same thread or losing a race)

If the coroutine suspends, but the awaitor's await_suspend() returns a coroutine handle whose resumption ultimately results in the current coroutine resuming before the ramp function finishes

Or we can model it as, if it is possible to reach the section after coro.end from entry without reaching a real suspend point.

rkjnsn · 2025-02-28T06:21:33Z

Even for a real suspend, though, isn't it still possible for the coroutine to be resumed on a different thread, completed, and the frame deleted, all before the coroutine ramp finishes executing on the current thread?

ChuanqiXu9 · 2025-02-28T06:25:59Z

Even for a real suspend, though, isn't it still possible for the coroutine to be resumed on a different thread, completed, and the frame deleted, all before the coroutine ramp finishes executing on the current thread?

Technically possible, you got me.

llvm-ci · 2025-02-28T08:59:45Z

LLVM Buildbot has detected a new failure on builder lldb-x86_64-debian running on lldb-x86_64-debian while building clang at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/162/builds/17051

Here is the relevant piece of the build log for the reference

Step 6 (test) failure: build (failure)
...
UNSUPPORTED: lldb-shell :: ScriptInterpreter/Python/Crashlog/no_threadState.test (2780 of 2789)
UNSUPPORTED: lldb-shell :: ScriptInterpreter/Python/Crashlog/interactive_crashlog_legacy.test (2781 of 2789)
UNSUPPORTED: lldb-shell :: ScriptInterpreter/Python/Crashlog/interactive_crashlog_invalid_target.test (2782 of 2789)
PASS: lldb-api :: api/multithreaded/TestMultithreaded.py (2783 of 2789)
PASS: lldb-api :: terminal/TestEditlineCompletions.py (2784 of 2789)
PASS: lldb-api :: tools/lldb-server/TestGdbRemoteFork.py (2785 of 2789)
PASS: lldb-api :: tools/lldb-server/TestLldbGdbServer.py (2786 of 2789)
PASS: lldb-api :: commands/process/attach/TestProcessAttach.py (2787 of 2789)
PASS: lldb-api :: repl/clang/TestClangREPL.py (2788 of 2789)
UNRESOLVED: lldb-api :: commands/gui/spawn-threads/TestGuiSpawnThreads.py (2789 of 2789)
******************** TEST 'lldb-api :: commands/gui/spawn-threads/TestGuiSpawnThreads.py' FAILED ********************
Script:
--
/usr/bin/python3 /home/worker/2.0.1/lldb-x86_64-debian/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/./lib --env LLVM_INCLUDE_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/include --env LLVM_TOOLS_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/./bin --arch x86_64 --build-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex --lldb-module-cache-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/lldb --compiler /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/clang --dsymutil /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/worker/2.0.1/lldb-x86_64-debian/build/./bin --lldb-obj-root /home/worker/2.0.1/lldb-x86_64-debian/build/tools/lldb --lldb-libs-dir /home/worker/2.0.1/lldb-x86_64-debian/build/./lib -t /home/worker/2.0.1/lldb-x86_64-debian/llvm-project/lldb/test/API/commands/gui/spawn-threads -p TestGuiSpawnThreads.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision d0edd931bcc328b9502289d346f2b2219341f853)
  clang revision d0edd931bcc328b9502289d346f2b2219341f853
  llvm revision d0edd931bcc328b9502289d346f2b2219341f853
(lldb) settings clear -all
(lldb) settings set symbols.enable-external-lookup false
(lldb) settings set target.inherit-tcc true
(lldb) settings set target.disable-aslr false
(lldb) settings set target.detach-on-error false
(lldb) settings set target.auto-apply-fixits false
(lldb) settings set plugin.process.gdb-remote.packet-timeout 60
(lldb) settings set symbols.clang-modules-cache-path "/home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/module-cache-lldb/lldb-api"
(lldb) settings set use-color false
(lldb) target create "/home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/commands/gui/spawn-threads/TestGuiSpawnThreads.test_gui/a.out"

Parsing symbol table: a.out...�[K

�[2K
Locating external symbol file: a.out...�[K

�[2K
Locating external symbol file: a.out...�[K

�[2K
Locating external symbol file: a.out...�[K

�[2K
[0/10] Manually indexing DWARF: a.out...�[K

�[2KCurrent executable set to '/home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/commands/gui/spawn-threads/TestGuiSpawnThreads.test_gui/a.out' (x86_64).
(lldb) breakpoint set -f main.cpp -p "break here"

Metadata should not be "load bearing", i.e. required for correctness, since optimizations are not required to preserve it. Therefore, turn this into an intrinsic instead. This is follow-up to llvm#127653.

…lvm#127653) Parameters to a coroutine get copied (moved) to coroutine-local instances which code inside the coroutine then uses. The original parameters should not be part of the frame. Normally CoroSplit figures that out by itself, but for [[clang::trivial_abi]] parameters which, get destructed at the end of the ramp function, it does not (see bug), causing use-after-free's if the frame is destroyed before the end of the ramp (as happens if it doesn't suspend). Since Clang knows these should never be part of the frame, use metadata to make it so. Fixes llvm#127499

zmodem added clang:codegen IR generation bugs: mangling, exceptions, etc. coroutines C++20 coroutines labels Feb 18, 2025

llvmbot added the clang Clang issues not falling into any other category label Feb 18, 2025

zmodem requested review from rnk and ChuanqiXu9 February 18, 2025 15:30

This was referenced Feb 18, 2025

[Coroutines] Allocas used after @coro.end should not go in the coro frame #127524

Closed

[coro] Use-after-free of callee-destructed parameter in coroutine which doesn't suspend #127499

Closed

fix the TrivialABI move ctor

4605efb

introduce and use an @llvm.coro.outside.frame intrinsic instead

26e2610

llvmbot added llvm:ir llvm:transforms labels Feb 20, 2025

rnk reviewed Feb 20, 2025

View reviewed changes

add x86_64-pc-win32 test case

b88b064

zmodem added 2 commits February 24, 2025 14:23

handle coro.outside.frame in mem2reg

225c37c

add nocapture and memory(none) to the intrinsic

acb4146

revert back to using the metadata

52a918e

rnk approved these changes Feb 28, 2025

View reviewed changes

ChuanqiXu9 approved these changes Feb 28, 2025

View reviewed changes

expand comment

ce9cefc

zmodem merged commit d0edd93 into llvm:main Feb 28, 2025
6 of 11 checks passed

zmodem mentioned this pull request Feb 28, 2025

[Coroutines] Replace coro.outside.frame metadata with an intrinsic #129255

Open


		void consume(int) noexcept;

		// Similarly to the [[clang::trivial_abi]] parameters, with the MSVC ABI

[Coroutines] Mark parameter allocas with coro.outside.frame metadata #127653

[Coroutines] Mark parameter allocas with coro.outside.frame metadata #127653

Uh oh!

Conversation

zmodem commented Feb 18, 2025

Uh oh!

llvmbot commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zmodem commented Feb 18, 2025

Uh oh!

efriedma-quic commented Feb 18, 2025

Uh oh!

rnk commented Feb 19, 2025

Uh oh!

efriedma-quic commented Feb 19, 2025

Uh oh!

ChuanqiXu9 commented Feb 19, 2025

Uh oh!

zmodem commented Feb 19, 2025

Uh oh!

zmodem commented Feb 20, 2025

Uh oh!

github-actions bot commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rnk left a comment

Choose a reason for hiding this comment

Uh oh!

ChuanqiXu9 commented Feb 21, 2025

Uh oh!

zmodem commented Feb 21, 2025

Uh oh!

ChuanqiXu9 commented Feb 24, 2025

Uh oh!

zmodem commented Feb 24, 2025

Uh oh!

efriedma-quic commented Feb 24, 2025

Uh oh!

rnk commented Feb 24, 2025

Uh oh!

zmodem commented Feb 25, 2025

Uh oh!

zmodem commented Feb 27, 2025

Uh oh!

rnk left a comment

Choose a reason for hiding this comment

Uh oh!

rnk Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

zmodem Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

rnk Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

ChuanqiXu9 left a comment

Choose a reason for hiding this comment

Uh oh!

rkjnsn commented Feb 28, 2025

Uh oh!

ChuanqiXu9 commented Feb 28, 2025

Uh oh!

rkjnsn commented Feb 28, 2025

Uh oh!

ChuanqiXu9 commented Feb 28, 2025

Uh oh!

Uh oh!

llvm-ci commented Feb 28, 2025

Uh oh!

Uh oh!

llvmbot commented Feb 18, 2025 •

edited

Loading

github-actions bot commented Feb 20, 2025 •

edited

Loading