Skip to content

[Coroutines] Mark parameter allocas with coro.outside.frame metadata #127653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

zmodem
Copy link
Collaborator

@zmodem zmodem commented Feb 18, 2025

Parameters to a coroutine get copied (moved) to coroutine-local instances which code inside the coroutine then uses.

The original parameters should not be part of the frame. Normally CoroSplit figures that out by itself, but for [[clang::trivial_abi]] parameters which, get destructed at the end of the ramp function, it does not (see bug), causing use-after-free's if the frame is destroyed before the end of the ramp (as happens if it doesn't suspend).

Since Clang knows these should never be part of the frame, use metadata to make it so.

Fixes #127499

Parameters to a coroutine get copied (moved) to coroutine-local
instances which code inside the coroutine then uses.

The original parameters should not be part of the frame. Normally
CoroSplit figures that out by itself, but for [[clang::trivial_abi]]
parameters which, get destructed at the end of the ramp function,
it does not (see bug), causing use-after-free's if the frame is
destroyed before the end of the ramp (as happens if it doesn't suspend).

Since Clang knows these should never be part of the frame, use metadata
to make it so.

Fixes llvm#127499
@zmodem zmodem added clang:codegen IR generation bugs: mangling, exceptions, etc. coroutines C++20 coroutines labels Feb 18, 2025
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Feb 18, 2025
@zmodem zmodem requested review from rnk and ChuanqiXu9 February 18, 2025 15:30
@llvmbot
Copy link
Member

llvmbot commented Feb 18, 2025

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-clang
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-coroutines

Author: Hans Wennborg (zmodem)

Changes

Parameters to a coroutine get copied (moved) to coroutine-local instances which code inside the coroutine then uses.

The original parameters should not be part of the frame. Normally CoroSplit figures that out by itself, but for [[clang::trivial_abi]] parameters which, get destructed at the end of the ramp function, it does not (see bug), causing use-after-free's if the frame is destroyed before the end of the ramp (as happens if it doesn't suspend).

Since Clang knows these should never be part of the frame, use metadata to make it so.

Fixes #127499


Full diff: https://github.com/llvm/llvm-project/pull/127653.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CGCoroutine.cpp (+10)
  • (modified) clang/test/CodeGenCoroutines/coro-params.cpp (+30-9)
diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index 9abf2e8c9190d..cdc61676524b4 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -855,6 +855,16 @@ void CodeGenFunction::EmitCoroutineBody(const CoroutineBodyStmt &S) {
     // Create parameter copies. We do it before creating a promise, since an
     // evolution of coroutine TS may allow promise constructor to observe
     // parameter copies.
+    for (const ParmVarDecl *Parm : FnArgs) {
+      // If the original param is in an alloca, exclude it from the coroutine
+      // frame. The parameter copy will be part of the frame.
+      Address ParmAddr = GetAddrOfLocalVar(Parm);
+      if (auto *ParmAlloca =
+              dyn_cast<llvm::AllocaInst>(ParmAddr.getBasePointer())) {
+        ParmAlloca->setMetadata(llvm::LLVMContext::MD_coro_outside_frame,
+                                llvm::MDNode::get(CGM.getLLVMContext(), {}));
+      }
+    }
     for (auto *PM : S.getParamMoves()) {
       EmitStmt(PM);
       ParamReplacer.addCopy(cast<DeclStmt>(PM));
diff --git a/clang/test/CodeGenCoroutines/coro-params.cpp b/clang/test/CodeGenCoroutines/coro-params.cpp
index b318f2f52ac09..9e640fb2c5c62 100644
--- a/clang/test/CodeGenCoroutines/coro-params.cpp
+++ b/clang/test/CodeGenCoroutines/coro-params.cpp
@@ -59,13 +59,22 @@ struct MoveAndCopy {
   ~MoveAndCopy();
 };
 
-void consume(int,int,int) noexcept;
+struct [[clang::trivial_abi]] TrivialABI {
+  int val;
+  TrivialABI(MoveAndCopy&&) noexcept;
+  ~TrivialABI();
+};
+
+void consume(int,int,int,int) noexcept;
 
 // TODO: Add support for CopyOnly params
-// CHECK: define{{.*}} void @_Z1fi8MoveOnly11MoveAndCopy(i32 noundef %val, ptr noundef %[[MoParam:.+]], ptr noundef %[[McParam:.+]]) #0 personality ptr @__gxx_personality_v0
-void f(int val, MoveOnly moParam, MoveAndCopy mcParam) {
+// CHECK: define{{.*}} void @_Z1fi8MoveOnly11MoveAndCopy10TrivialABI(i32 noundef %val, ptr noundef %[[MoParam:.+]], ptr noundef %[[McParam:.+]], i32 %[[TrivialParam:.+]]) #0 personality ptr @__gxx_personality_v0
+void f(int val, MoveOnly moParam, MoveAndCopy mcParam, TrivialABI trivialParam) {
+  // CHECK: %[[TrivialAlloca:.+]] = alloca %struct.TrivialABI,
+  // CHECK-SAME: !coro.outside.frame
   // CHECK: %[[MoCopy:.+]] = alloca %struct.MoveOnly,
   // CHECK: %[[McCopy:.+]] = alloca %struct.MoveAndCopy,
+  // CHECK: %[[TrivialCopy:.+]] = alloca %struct.TrivialABI,
   // CHECK: store i32 %val, ptr %[[ValAddr:.+]]
 
   // CHECK: call ptr @llvm.coro.begin(
@@ -73,25 +82,33 @@ void f(int val, MoveOnly moParam, MoveAndCopy mcParam) {
   // CHECK-NEXT: call void @llvm.lifetime.start.p0(
   // CHECK-NEXT: call void @_ZN11MoveAndCopyC1EOS_(ptr {{[^,]*}} %[[McCopy]], ptr noundef nonnull align 4 dereferenceable(4) %[[McParam]]) #
   // CHECK-NEXT: call void @llvm.lifetime.start.p0(
-  // CHECK-NEXT: invoke void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopyEE12promise_typeC1Ev(
+  // CHECK-NEXT: call void @llvm.memcpy
+  // CHECK-SAME: %[[TrivialCopy]]
+  // CHECK-SAME: %[[TrivialAlloca]]
+  // CHECK-NEXT: call void @llvm.lifetime.start.p0(
+  // CHECK-NEXT: invoke void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopy10TrivialABIEE12promise_typeC1Ev(
 
   // CHECK: call void @_ZN14suspend_always12await_resumeEv(
   // CHECK: %[[IntParam:.+]] = load i32, ptr %{{.*}}
   // CHECK: %[[MoGep:.+]] = getelementptr inbounds nuw %struct.MoveOnly, ptr %[[MoCopy]], i32 0, i32 0
   // CHECK: %[[MoVal:.+]] = load i32, ptr %[[MoGep]]
-  // CHECK: %[[McGep:.+]] =  getelementptr inbounds nuw %struct.MoveAndCopy, ptr %[[McCopy]], i32 0, i32 0
+  // CHECK: %[[McGep:.+]] = getelementptr inbounds nuw %struct.MoveAndCopy, ptr %[[McCopy]], i32 0, i32 0
   // CHECK: %[[McVal:.+]] = load i32, ptr %[[McGep]]
-  // CHECK: call void @_Z7consumeiii(i32 noundef %[[IntParam]], i32 noundef %[[MoVal]], i32 noundef %[[McVal]])
+  // CHECK: %[[TrivialGep:.+]] = getelementptr inbounds nuw %struct.TrivialABI, ptr %[[TrivialCopy]], i32 0, i32 0
+  // CHECK: %[[TrivialVal:.+]] = load i32, ptr %[[TrivialGep]]
+  // CHECK: call void @_Z7consumeiiii(i32 noundef %[[IntParam]], i32 noundef %[[MoVal]], i32 noundef %[[McVal]], i32 noundef %[[TrivialVal]])
 
-  consume(val, moParam.val, mcParam.val);
+  consume(val, moParam.val, mcParam.val, trivialParam.val);
   co_return;
 
   // Skip to final suspend:
-  // CHECK: call void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopyEE12promise_type13final_suspendEv(
+  // CHECK: call void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopy10TrivialABIEE12promise_type13final_suspendEv(
   // CHECK: call void @_ZN14suspend_always12await_resumeEv(
 
   // Destroy promise, then parameter copies:
-  // CHECK: call void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopyEE12promise_typeD1Ev(ptr {{[^,]*}} %__promise)
+  // CHECK: call void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopy10TrivialABIEE12promise_typeD1Ev(ptr {{[^,]*}} %__promise)
+  // CHECK-NEXT: call void @llvm.lifetime.end.p0(
+  // CHECK-NEXT: call void @_ZN10TrivialABID1Ev(ptr {{[^,]*}} %[[TrivialCopy]])
   // CHECK-NEXT: call void @llvm.lifetime.end.p0(
   // CHECK-NEXT: call void @_ZN11MoveAndCopyD1Ev(ptr {{[^,]*}} %[[McCopy]])
   // CHECK-NEXT: call void @llvm.lifetime.end.p0(
@@ -99,6 +116,10 @@ void f(int val, MoveOnly moParam, MoveAndCopy mcParam) {
   // CHECK-NEXT: call void @llvm.lifetime.end.p0(
   // CHECK-NEXT: call void @llvm.lifetime.end.p0(
   // CHECK-NEXT: call ptr @llvm.coro.free(
+
+  // The original trivial_abi parameter is destroyed when returning from the ramp.
+  // CHECK: call i1 @llvm.coro.end
+  // CHECK: call void @_ZN10TrivialABID1Ev(ptr {{[^,]*}} %[[TrivialAlloca]])
 }
 
 // CHECK-LABEL: void @_Z16dependent_paramsI1A1BEvT_T0_S3_(ptr noundef %x, ptr noundef %0, ptr noundef %y)

@zmodem
Copy link
Collaborator Author

zmodem commented Feb 18, 2025

I think this is a better fix than #127524. Please take a look.

@efriedma-quic
Copy link
Collaborator

Adding metadata to an instruction should never be required for correctness: optimizations are not required to preserve metadata on instructions, so this "fix" will just break again later.

If we need two different kinds of alloca in coroutine functions, we need a different way to identify them.

@rnk
Copy link
Collaborator

rnk commented Feb 19, 2025

That may be true, but grepping suggests that coro_outside_frame is pre-existing load-bearing metadata. I think it would be reasonable to use it to fix correctness today and work towards the longer term goal of representing important semantic information as first class alloca fields. We already have a significant semantics changing property, inalloca, that we could model this after.

@efriedma-quic
Copy link
Collaborator

I'm not really happy with that... but if the semantics were never properly defined in the first place, I guess this isn't making things worse.

Not sure if marking the allocas themselves is actually the right approach long-term. It seems like there's a distinction between code that's "inside" the coroutine (between the begin/end) and "outside" the coroutine, and there are a bunch of restrictions on code that's "outside" the coroutine. I'm not sure I really understand the mechanics of that, though.

@ChuanqiXu9
Copy link
Member

I don't feel the previous approach is too problematic. And for this one, I agree with Eli that the metadate may get erased somehow during the transformation.

@zmodem
Copy link
Collaborator Author

zmodem commented Feb 19, 2025

Adding metadata to an instruction should never be required for correctness

I figured the existing use and the fact that it only needs to survive until CoroSplit made it good enough. But you're right, we should do better.

I do think we need an explicit way to tell CoroSplit whether an alloca should go in the frame or not. I was thinking of adding an intrinsic when I found this metadata.

How about turning coro.outside.frame into an intrinsic instead?

I don't feel the previous approach is too problematic.

I was concerned that it relied on semantics which aren't really defined. There is nothing in the langref which says that the coro frame cannot be accessed after coro.end. As far as I can tell, doing so might be fine as long as the frame hasn't been deallocated, which we can't infer statically.

In fact, my change broke a test which seems to be doing exactly that. Just because it's old doesn't mean it was wrong. So the patch didn't feel solid.

On the other hand, the frontend knows exactly what's going on with these parameter allocas: they should be outside the frame, so having a way to communicate that seems like a solid fix.

@zmodem
Copy link
Collaborator Author

zmodem commented Feb 20, 2025

I pushed a sketch of what using an intrinsic instead would look like. What do you think?

Copy link

github-actions bot commented Feb 20, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the issue (#127499) you pointed out that this issue applies to the MSVC ABI where all parameters are destroyed in the callee. Is it reasonable to construct the analogous test case in that environment? I tried to do this locally, but can't for unrelated build configuration reasons.

@ChuanqiXu9
Copy link
Member

Adding metadata to an instruction should never be required for correctness

I figured the existing use and the fact that it only needs to survive until CoroSplit made it good enough. But you're right, we should do better.

I do think we need an explicit way to tell CoroSplit whether an alloca should go in the frame or not. I was thinking of adding an intrinsic when I found this metadata.

How about turning coro.outside.frame into an intrinsic instead?

My instinct reaction is, it may block some optimizations to optimize the alloca out. I don't feel very good about it. Is it possible to introduce the intrinsics without blocking optimizations?

I don't feel the previous approach is too problematic.

I was concerned that it relied on semantics which aren't really defined. There is nothing in the langref which says that the coro frame cannot be accessed after coro.end. As far as I can tell, doing so might be fine as long as the frame hasn't been deallocated, which we can't infer statically.

I believe we shouldn't access the coroutine frame after coro.end. If the doc doesn't make it clear, let's try to make it.

It is true that this may be fine with an optimization, but we don't need to care about that.

In fact, my change broke a test which seems to be doing exactly that. Just because it's old doesn't mean it was wrong. So the patch didn't feel solid.

I do feel it is wrong since it accesses the frame after it ends. It makes no sense.

On the other hand, the frontend knows exactly what's going on with these parameter allocas: they should be outside the frame, so having a way to communicate that seems like a solid fix.

@zmodem
Copy link
Collaborator Author

zmodem commented Feb 21, 2025

In the issue (#127499) you pointed out that this issue applies to the MSVC ABI where all parameters are destroyed in the callee. Is it reasonable to construct the analogous test case in that environment?

Yes, in fact the reproducer on that issue will trigger use-after-free on x86-64 Windows also without the trivial_abi attribute.

I've updated coro-params.cpp to cover this as well.

My instinct reaction is, it may block some optimizations to optimize the alloca out.

The intrinsic gets dropped by CoroSplit, so after that it's not a problem. Before that, I'm not sure much interesting optimization can happen to allocas anyway? Especially since it's not known if they will be part of the coro frame or not at that point.

I believe we shouldn't access the coroutine frame after coro.end. If the doc doesn't make it clear, let's try to make it.

How would we motivate that change though?

Also, would that solve the GRO issue (#49843) which led to adding the existing metadata?

In fact, my change broke a test which seems to be doing exactly that. Just because it's old doesn't mean it was wrong. So the patch didn't feel solid.

I do feel it is wrong since it accesses the frame after it ends. It makes no sense.

(To make it easier to follow, we're talking about this access.)

Looking closer, I think that code might be fine. The code after coro.end will go at the end of the ramp function. At that point the coro frame is still alive. And since that alloca is also accessed in the resume function (after a suspend), it needs to be in the frame.

@ChuanqiXu9
Copy link
Member

In the issue (#127499) you pointed out that this issue applies to the MSVC ABI where all parameters are destroyed in the callee. Is it reasonable to construct the analogous test case in that environment?

Yes, in fact the reproducer on that issue will trigger use-after-free on x86-64 Windows also without the trivial_abi attribute.

I've updated coro-params.cpp to cover this as well.

My instinct reaction is, it may block some optimizations to optimize the alloca out.

The intrinsic gets dropped by CoroSplit, so after that it's not a problem. Before that, I'm not sure much interesting optimization can happen to allocas anyway? Especially since it's not known if they will be part of the coro frame or not at that point.

As far as I know, a lot of allocas related optimizations/transformation happens before CoroSplit. e.g., mem2reg. And this is the reason why we put CoroSplit in the middle of passes.

I believe we shouldn't access the coroutine frame after coro.end. If the doc doesn't make it clear, let's try to make it.

How would we motivate that change though?

Also, would that solve the GRO issue (#49843) which led to adding the existing metadata?

I think so.

In fact, my change broke a test which seems to be doing exactly that. Just because it's old doesn't mean it was wrong. So the patch didn't feel solid.

I do feel it is wrong since it accesses the frame after it ends. It makes no sense.

(To make it easier to follow, we're talking about this access.)

Looking closer, I think that code might be fine. The code after coro.end will go at the end of the ramp function. At that point the coro frame is still alive. And since that alloca is also accessed in the resume function (after a suspend), it needs to be in the frame.

On the one hand, I still suspect to the validness of the old test. On the other hand, I think we should improve our algorithm to identify the area after coro.end as special area when calculating whether or not to put an alloca to the frame.

The short term solution may consider the case the coroutine won't suspend specially.

@zmodem
Copy link
Collaborator Author

zmodem commented Feb 24, 2025

My instinct reaction is, it may block some optimizations to optimize the alloca out.

The intrinsic gets dropped by CoroSplit, so after that it's not a problem. Before that, I'm not sure much interesting optimization can happen to allocas anyway? Especially since it's not known if they will be part of the coro frame or not at that point.

As far as I know, a lot of allocas related optimizations/transformation happens before CoroSplit. e.g., mem2reg. And this is the reason why we put CoroSplit in the middle of passes.

You're right. I hadn't realized that we run mem2reg before CoroSplit and then spill values back to the frame as needed.

I suppose mem2reg is especially relevant since it operates on allocas. If the transform could remove an alloca, we should not let the intrinsic block it. I've updated the PR to handle that.

For other optimizations that analyze uses of the alloca, I've marked the intrinsic memory(none) and the argument nocapture.

I think this way the intrinsic will not block important optimizations.

In fact, my change broke a test which seems to be doing exactly that. Just because it's old doesn't mean it was wrong. So the patch didn't feel solid.

I do feel it is wrong since it accesses the frame after it ends. It makes no sense.

(To make it easier to follow, we're talking about this access.)
Looking closer, I think that code might be fine. The code after coro.end will go at the end of the ramp function. At that point the coro frame is still alive. And since that alloca is also accessed in the resume function (after a suspend), it needs to be in the frame.

On the one hand, I still suspect to the validness of the old test. On the other hand, I think we should improve our algorithm to identify the area after coro.end as special area when calculating whether or not to put an alloca to the frame.

I think until someone points out exactly what's wrong about the old test, we have to assume that it's valid.

The short term solution may consider the case the coroutine won't suspend specially.

Can you explain this more?

@efriedma-quic
Copy link
Collaborator

I think if we're trying to automatically identify whether an alloca has one of two lifetimes, and one of those lifetimes isn't a subset of the other, we're guaranteed to run into trouble again, eventually. We have to pick the right lifetime 100% of the time... but without assistance from the frontend, it's impossible to write an algorithm that doesn't give up in some cases, given a pointer that escapes. So it's not really a question of "improving" the existing algorithm; we need markers in the IR, like coro_outside_frame, and the algorithm should be based on that.

@rnk
Copy link
Collaborator

rnk commented Feb 24, 2025

Re: sroa/mem2reg, that's a valid concern with Hans's intrinsic approach.

So it's not really a question of "improving" the existing algorithm; we need markers in the IR, like coro_outside_frame, and the algorithm should be based on that.

To check my understanding, by explicit markers, you mean first class fields of AllocaInst, like the inalloca bit, right?


Going back to the beginning, why do we end up in a UAF situation? IIUC, the ramp function should do the following:

  • store the bytes of the trivial abi argument into an alloca
  • call the move ctor into the coro frame
  • co_return
  • destroy the coro frame objects, matching the move ctor call
  • destroy the alloca, matching the construction from earlier in the caller

What is coro split doing that breaks this? It sounds like it's rewriting the second destructor to destroy the variable in the frame, but that's incorrect.

I guess the issue is that the coroutine frame is not modelled in IR during frontend lowering. It's implied that all allocas get moved into the frame, except in two cases, apparently.

What's different between the case where we suspend and the case where there are no suspend points?

@zmodem
Copy link
Collaborator Author

zmodem commented Feb 25, 2025

Re: sroa/mem2reg, that's a valid concern with Hans's intrinsic approach.

Is it still a concern in the latest version, which makes mem2reg aware of the intrinsic?

I don't have a good feel for whether putting a flag on AllocaInst or using an intrinsic is less disruptive. Happy to take input on that one.

Going back to the beginning, why do we end up in a UAF situation? IIUC, the ramp function should do the following:

  • store the bytes of the trivial abi argument into an alloca
  • call the move ctor into the coro frame
  • co_return
  • destroy the coro frame objects, matching the move ctor call
  • destroy the alloca, matching the construction from earlier in the caller

What is coro split doing that breaks this? It sounds like it's rewriting the second destructor to destroy the variable in the frame, but that's incorrect.

The problem is that the "parameter alloca" (first bullet point) incorrectly gets "promoted" to the coroutine frame. CoroSplit assumes it may be accessed across suspension points (I think it has to, as the address escapes via the "src" argument in the move ctor call).

What's different between the case where we suspend and the case where there are no suspend points?

The parameter's destructor runs at the end of the ramp function (last bullet point). If we're suspending, the coro frame is still alive at this point (for use when resuming). If the coro finished without suspending, the coro frame has been destroyed at this point, and we get UAF.

@zmodem
Copy link
Collaborator Author

zmodem commented Feb 27, 2025

The discussion seems to be stalling. I'll try to summarize.

There seem to be two major questions:

  1. Whether the front-end should explicitly mark some allocas as not belonging in the coroutine frame, or whether we should enhance CoroSplit to infer it automatically.

    I think explicitly marking the allocas is the best solution for the current bug ([coro] Use-after-free of callee-destructed parameter in coroutine which doesn't suspend #127499), as well as the previous bug (Coroutine use after free on Clang 12. #49843) which already adopted this approach.

    Based on @efriedma-quic's most recent comment, I think he's also in favor of this approach(?):

    So it's not really a question of "improving" the existing algorithm; we need markers in the IR, like coro_outside_frame, and the algorithm should be based on that.

    I don't think @rnk expressed an opinion on this question yet.

    @ChuanqiXu9 has argued in favor of the approach from [Coroutines] Allocas used after @coro.end should not go in the coro frame #127524 instead: making CoroSplit treat allocas referenced after the coro.end intrinsic as belonging outside the frame.

  2. If we do use explicit markers for the allocas, how should that be done?

    Currently, coro.outside.frame metadata on the allocas is used for this.

    @efriedma-quic pointed out metadata must never be required for correctness, and we all agree this current state is bad.

    I've proposed using an intrinsic.

    @rnk has suggested that we could put an explicit field on the alloca instruction, similar to inalloca.

    @ChuanqiXu9 has raised concerns about an intrinsic blocking optimization.


I'm keen to unbreak our users here, as well as implementing the proper long-term solution. Since there is some disagreement, maybe we shouldn't do it all at once.

I've rolled back my PR to just use the existing metadata. While that's not a complete solution, it should be a strict improvement: it fixes the immediate issue using the existing mechanism. I'd like to go ahead and land this.

I will follow up with a PR to remove the metadata, and we can continue discussing there.

Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think correctness comes first, so this prioritization makes sense.

@@ -855,6 +855,16 @@ void CodeGenFunction::EmitCoroutineBody(const CoroutineBodyStmt &S) {
// Create parameter copies. We do it before creating a promise, since an
// evolution of coroutine TS may allow promise constructor to observe
// parameter copies.
for (const ParmVarDecl *Parm : FnArgs) {
// If the original param is in an alloca, exclude it from the coroutine
// frame. The parameter copy will be part of the frame.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate in this comment:

// frame. The parameter copy will be part of the frame. The parameter copy 
// will live on the heap, but the original parameter memory should remain on
// the stack. This is necessary to ensure that parameters destroyed in callees, as
// with `trivial_abi` or in the MSVC C++ ABI, are appropriately destroyed after
// setting up the coroutine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


void consume(int) noexcept;

// Similarly to the [[clang::trivial_abi]] parameters, with the MSVC ABI
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is a good edge case to motivate generalizing this for all parameters.

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstractly the current one looks good to stop the bleeding first.

But for the issue itself, I still feel we can fix it in the middle end. Since the case a coroutine without a suspend point is relatively rare case to me.

@rkjnsn
Copy link

rkjnsn commented Feb 28, 2025

Since the case a coroutine without a suspend point is relatively rare case to me.

For what it's worth, this issue isn't limited to coroutines without any suspend point. It can also occur:

  • If the coroutine conditionally doesn't await (e.g., if (condition) { co_await foo(); } and condition is false)
  • If the coroutine awaits, but the awaitor's await_ready() returns true so the coroutine doesn't suspend
  • If the coroutine suspends, but the awaitor's await_suspend() returns false so the coroutine is immediately resumed
  • If the coroutine suspends, but the awaitor's await_suspend() causes the coroutine to resume before the ramp function finishes (either by resuming on the same thread or losing a race)
  • If the coroutine suspends, but the awaitor's await_suspend() returns a coroutine handle whose resumption ultimately results in the current coroutine resuming before the ramp function finishes

@ChuanqiXu9
Copy link
Member

Since the case a coroutine without a suspend point is relatively rare case to me.

For what it's worth, this issue isn't limited to coroutines without any suspend point. It can also occur:

  • If the coroutine conditionally doesn't await (e.g., if (condition) { co_await foo(); } and condition is false)
  • If the coroutine awaits, but the awaitor's await_ready() returns true so the coroutine doesn't suspend
  • If the coroutine suspends, but the awaitor's await_suspend() returns false so the coroutine is immediately resumed
  • If the coroutine suspends, but the awaitor's await_suspend() causes the coroutine to resume before the ramp function finishes (either by resuming on the same thread or losing a race)
  • If the coroutine suspends, but the awaitor's await_suspend() returns a coroutine handle whose resumption ultimately results in the current coroutine resuming before the ramp function finishes

Or we can model it as, if it is possible to reach the section after coro.end from entry without reaching a real suspend point.

@rkjnsn
Copy link

rkjnsn commented Feb 28, 2025

Even for a real suspend, though, isn't it still possible for the coroutine to be resumed on a different thread, completed, and the frame deleted, all before the coroutine ramp finishes executing on the current thread?

@ChuanqiXu9
Copy link
Member

Even for a real suspend, though, isn't it still possible for the coroutine to be resumed on a different thread, completed, and the frame deleted, all before the coroutine ramp finishes executing on the current thread?

Technically possible, you got me.

@zmodem zmodem merged commit d0edd93 into llvm:main Feb 28, 2025
6 of 11 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 28, 2025

LLVM Buildbot has detected a new failure on builder lldb-x86_64-debian running on lldb-x86_64-debian while building clang at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/162/builds/17051

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
UNSUPPORTED: lldb-shell :: ScriptInterpreter/Python/Crashlog/no_threadState.test (2780 of 2789)
UNSUPPORTED: lldb-shell :: ScriptInterpreter/Python/Crashlog/interactive_crashlog_legacy.test (2781 of 2789)
UNSUPPORTED: lldb-shell :: ScriptInterpreter/Python/Crashlog/interactive_crashlog_invalid_target.test (2782 of 2789)
PASS: lldb-api :: api/multithreaded/TestMultithreaded.py (2783 of 2789)
PASS: lldb-api :: terminal/TestEditlineCompletions.py (2784 of 2789)
PASS: lldb-api :: tools/lldb-server/TestGdbRemoteFork.py (2785 of 2789)
PASS: lldb-api :: tools/lldb-server/TestLldbGdbServer.py (2786 of 2789)
PASS: lldb-api :: commands/process/attach/TestProcessAttach.py (2787 of 2789)
PASS: lldb-api :: repl/clang/TestClangREPL.py (2788 of 2789)
UNRESOLVED: lldb-api :: commands/gui/spawn-threads/TestGuiSpawnThreads.py (2789 of 2789)
******************** TEST 'lldb-api :: commands/gui/spawn-threads/TestGuiSpawnThreads.py' FAILED ********************
Script:
--
/usr/bin/python3 /home/worker/2.0.1/lldb-x86_64-debian/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/./lib --env LLVM_INCLUDE_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/include --env LLVM_TOOLS_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/./bin --arch x86_64 --build-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex --lldb-module-cache-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/lldb --compiler /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/clang --dsymutil /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/worker/2.0.1/lldb-x86_64-debian/build/./bin --lldb-obj-root /home/worker/2.0.1/lldb-x86_64-debian/build/tools/lldb --lldb-libs-dir /home/worker/2.0.1/lldb-x86_64-debian/build/./lib -t /home/worker/2.0.1/lldb-x86_64-debian/llvm-project/lldb/test/API/commands/gui/spawn-threads -p TestGuiSpawnThreads.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision d0edd931bcc328b9502289d346f2b2219341f853)
  clang revision d0edd931bcc328b9502289d346f2b2219341f853
  llvm revision d0edd931bcc328b9502289d346f2b2219341f853
(lldb) settings clear -all
(lldb) settings set symbols.enable-external-lookup false
(lldb) settings set target.inherit-tcc true
(lldb) settings set target.disable-aslr false
(lldb) settings set target.detach-on-error false
(lldb) settings set target.auto-apply-fixits false
(lldb) settings set plugin.process.gdb-remote.packet-timeout 60
(lldb) settings set symbols.clang-modules-cache-path "/home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/module-cache-lldb/lldb-api"
(lldb) settings set use-color false
(lldb) target create "/home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/commands/gui/spawn-threads/TestGuiSpawnThreads.test_gui/a.out"

Parsing symbol table: a.out...�[K

�[2K
Locating external symbol file: a.out...�[K

�[2K
Locating external symbol file: a.out...�[K

�[2K
Locating external symbol file: a.out...�[K

�[2K
[0/10] Manually indexing DWARF: a.out...�[K

�[2KCurrent executable set to '/home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/commands/gui/spawn-threads/TestGuiSpawnThreads.test_gui/a.out' (x86_64).
(lldb) breakpoint set -f main.cpp -p "break here"

zmodem added a commit to zmodem/llvm-project that referenced this pull request Feb 28, 2025
Metadata should not be "load bearing", i.e. required for correctness,
since optimizations are not required to preserve it. Therefore, turn
this into an intrinsic instead.

This is follow-up to llvm#127653.
cheezeburglar pushed a commit to cheezeburglar/llvm-project that referenced this pull request Feb 28, 2025
…lvm#127653)

Parameters to a coroutine get copied (moved) to coroutine-local
instances which code inside the coroutine then uses.

The original parameters should not be part of the frame. Normally
CoroSplit figures that out by itself, but for [[clang::trivial_abi]]
parameters which, get destructed at the end of the ramp function, it
does not (see bug), causing use-after-free's if the frame is destroyed
before the end of the ramp (as happens if it doesn't suspend).

Since Clang knows these should never be part of the frame, use metadata
to make it so.

Fixes llvm#127499
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category coroutines C++20 coroutines llvm:ir llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[coro] Use-after-free of callee-destructed parameter in coroutine which doesn't suspend
7 participants