-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[Coroutines] Mark parameter allocas with coro.outside.frame metadata #127653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Coroutines] Mark parameter allocas with coro.outside.frame metadata #127653
Conversation
Parameters to a coroutine get copied (moved) to coroutine-local instances which code inside the coroutine then uses. The original parameters should not be part of the frame. Normally CoroSplit figures that out by itself, but for [[clang::trivial_abi]] parameters which, get destructed at the end of the ramp function, it does not (see bug), causing use-after-free's if the frame is destroyed before the end of the ramp (as happens if it doesn't suspend). Since Clang knows these should never be part of the frame, use metadata to make it so. Fixes llvm#127499
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-coroutines Author: Hans Wennborg (zmodem) ChangesParameters to a coroutine get copied (moved) to coroutine-local instances which code inside the coroutine then uses. The original parameters should not be part of the frame. Normally CoroSplit figures that out by itself, but for [[clang::trivial_abi]] parameters which, get destructed at the end of the ramp function, it does not (see bug), causing use-after-free's if the frame is destroyed before the end of the ramp (as happens if it doesn't suspend). Since Clang knows these should never be part of the frame, use metadata to make it so. Fixes #127499 Full diff: https://github.com/llvm/llvm-project/pull/127653.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index 9abf2e8c9190d..cdc61676524b4 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -855,6 +855,16 @@ void CodeGenFunction::EmitCoroutineBody(const CoroutineBodyStmt &S) {
// Create parameter copies. We do it before creating a promise, since an
// evolution of coroutine TS may allow promise constructor to observe
// parameter copies.
+ for (const ParmVarDecl *Parm : FnArgs) {
+ // If the original param is in an alloca, exclude it from the coroutine
+ // frame. The parameter copy will be part of the frame.
+ Address ParmAddr = GetAddrOfLocalVar(Parm);
+ if (auto *ParmAlloca =
+ dyn_cast<llvm::AllocaInst>(ParmAddr.getBasePointer())) {
+ ParmAlloca->setMetadata(llvm::LLVMContext::MD_coro_outside_frame,
+ llvm::MDNode::get(CGM.getLLVMContext(), {}));
+ }
+ }
for (auto *PM : S.getParamMoves()) {
EmitStmt(PM);
ParamReplacer.addCopy(cast<DeclStmt>(PM));
diff --git a/clang/test/CodeGenCoroutines/coro-params.cpp b/clang/test/CodeGenCoroutines/coro-params.cpp
index b318f2f52ac09..9e640fb2c5c62 100644
--- a/clang/test/CodeGenCoroutines/coro-params.cpp
+++ b/clang/test/CodeGenCoroutines/coro-params.cpp
@@ -59,13 +59,22 @@ struct MoveAndCopy {
~MoveAndCopy();
};
-void consume(int,int,int) noexcept;
+struct [[clang::trivial_abi]] TrivialABI {
+ int val;
+ TrivialABI(MoveAndCopy&&) noexcept;
+ ~TrivialABI();
+};
+
+void consume(int,int,int,int) noexcept;
// TODO: Add support for CopyOnly params
-// CHECK: define{{.*}} void @_Z1fi8MoveOnly11MoveAndCopy(i32 noundef %val, ptr noundef %[[MoParam:.+]], ptr noundef %[[McParam:.+]]) #0 personality ptr @__gxx_personality_v0
-void f(int val, MoveOnly moParam, MoveAndCopy mcParam) {
+// CHECK: define{{.*}} void @_Z1fi8MoveOnly11MoveAndCopy10TrivialABI(i32 noundef %val, ptr noundef %[[MoParam:.+]], ptr noundef %[[McParam:.+]], i32 %[[TrivialParam:.+]]) #0 personality ptr @__gxx_personality_v0
+void f(int val, MoveOnly moParam, MoveAndCopy mcParam, TrivialABI trivialParam) {
+ // CHECK: %[[TrivialAlloca:.+]] = alloca %struct.TrivialABI,
+ // CHECK-SAME: !coro.outside.frame
// CHECK: %[[MoCopy:.+]] = alloca %struct.MoveOnly,
// CHECK: %[[McCopy:.+]] = alloca %struct.MoveAndCopy,
+ // CHECK: %[[TrivialCopy:.+]] = alloca %struct.TrivialABI,
// CHECK: store i32 %val, ptr %[[ValAddr:.+]]
// CHECK: call ptr @llvm.coro.begin(
@@ -73,25 +82,33 @@ void f(int val, MoveOnly moParam, MoveAndCopy mcParam) {
// CHECK-NEXT: call void @llvm.lifetime.start.p0(
// CHECK-NEXT: call void @_ZN11MoveAndCopyC1EOS_(ptr {{[^,]*}} %[[McCopy]], ptr noundef nonnull align 4 dereferenceable(4) %[[McParam]]) #
// CHECK-NEXT: call void @llvm.lifetime.start.p0(
- // CHECK-NEXT: invoke void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopyEE12promise_typeC1Ev(
+ // CHECK-NEXT: call void @llvm.memcpy
+ // CHECK-SAME: %[[TrivialCopy]]
+ // CHECK-SAME: %[[TrivialAlloca]]
+ // CHECK-NEXT: call void @llvm.lifetime.start.p0(
+ // CHECK-NEXT: invoke void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopy10TrivialABIEE12promise_typeC1Ev(
// CHECK: call void @_ZN14suspend_always12await_resumeEv(
// CHECK: %[[IntParam:.+]] = load i32, ptr %{{.*}}
// CHECK: %[[MoGep:.+]] = getelementptr inbounds nuw %struct.MoveOnly, ptr %[[MoCopy]], i32 0, i32 0
// CHECK: %[[MoVal:.+]] = load i32, ptr %[[MoGep]]
- // CHECK: %[[McGep:.+]] = getelementptr inbounds nuw %struct.MoveAndCopy, ptr %[[McCopy]], i32 0, i32 0
+ // CHECK: %[[McGep:.+]] = getelementptr inbounds nuw %struct.MoveAndCopy, ptr %[[McCopy]], i32 0, i32 0
// CHECK: %[[McVal:.+]] = load i32, ptr %[[McGep]]
- // CHECK: call void @_Z7consumeiii(i32 noundef %[[IntParam]], i32 noundef %[[MoVal]], i32 noundef %[[McVal]])
+ // CHECK: %[[TrivialGep:.+]] = getelementptr inbounds nuw %struct.TrivialABI, ptr %[[TrivialCopy]], i32 0, i32 0
+ // CHECK: %[[TrivialVal:.+]] = load i32, ptr %[[TrivialGep]]
+ // CHECK: call void @_Z7consumeiiii(i32 noundef %[[IntParam]], i32 noundef %[[MoVal]], i32 noundef %[[McVal]], i32 noundef %[[TrivialVal]])
- consume(val, moParam.val, mcParam.val);
+ consume(val, moParam.val, mcParam.val, trivialParam.val);
co_return;
// Skip to final suspend:
- // CHECK: call void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopyEE12promise_type13final_suspendEv(
+ // CHECK: call void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopy10TrivialABIEE12promise_type13final_suspendEv(
// CHECK: call void @_ZN14suspend_always12await_resumeEv(
// Destroy promise, then parameter copies:
- // CHECK: call void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopyEE12promise_typeD1Ev(ptr {{[^,]*}} %__promise)
+ // CHECK: call void @_ZNSt16coroutine_traitsIJvi8MoveOnly11MoveAndCopy10TrivialABIEE12promise_typeD1Ev(ptr {{[^,]*}} %__promise)
+ // CHECK-NEXT: call void @llvm.lifetime.end.p0(
+ // CHECK-NEXT: call void @_ZN10TrivialABID1Ev(ptr {{[^,]*}} %[[TrivialCopy]])
// CHECK-NEXT: call void @llvm.lifetime.end.p0(
// CHECK-NEXT: call void @_ZN11MoveAndCopyD1Ev(ptr {{[^,]*}} %[[McCopy]])
// CHECK-NEXT: call void @llvm.lifetime.end.p0(
@@ -99,6 +116,10 @@ void f(int val, MoveOnly moParam, MoveAndCopy mcParam) {
// CHECK-NEXT: call void @llvm.lifetime.end.p0(
// CHECK-NEXT: call void @llvm.lifetime.end.p0(
// CHECK-NEXT: call ptr @llvm.coro.free(
+
+ // The original trivial_abi parameter is destroyed when returning from the ramp.
+ // CHECK: call i1 @llvm.coro.end
+ // CHECK: call void @_ZN10TrivialABID1Ev(ptr {{[^,]*}} %[[TrivialAlloca]])
}
// CHECK-LABEL: void @_Z16dependent_paramsI1A1BEvT_T0_S3_(ptr noundef %x, ptr noundef %0, ptr noundef %y)
|
I think this is a better fix than #127524. Please take a look. |
Adding metadata to an instruction should never be required for correctness: optimizations are not required to preserve metadata on instructions, so this "fix" will just break again later. If we need two different kinds of alloca in coroutine functions, we need a different way to identify them. |
That may be true, but grepping suggests that |
I'm not really happy with that... but if the semantics were never properly defined in the first place, I guess this isn't making things worse. Not sure if marking the allocas themselves is actually the right approach long-term. It seems like there's a distinction between code that's "inside" the coroutine (between the begin/end) and "outside" the coroutine, and there are a bunch of restrictions on code that's "outside" the coroutine. I'm not sure I really understand the mechanics of that, though. |
I don't feel the previous approach is too problematic. And for this one, I agree with Eli that the metadate may get erased somehow during the transformation. |
I figured the existing use and the fact that it only needs to survive until CoroSplit made it good enough. But you're right, we should do better. I do think we need an explicit way to tell CoroSplit whether an alloca should go in the frame or not. I was thinking of adding an intrinsic when I found this metadata. How about turning
I was concerned that it relied on semantics which aren't really defined. There is nothing in the langref which says that the coro frame cannot be accessed after coro.end. As far as I can tell, doing so might be fine as long as the frame hasn't been deallocated, which we can't infer statically. In fact, my change broke a test which seems to be doing exactly that. Just because it's old doesn't mean it was wrong. So the patch didn't feel solid. On the other hand, the frontend knows exactly what's going on with these parameter allocas: they should be outside the frame, so having a way to communicate that seems like a solid fix. |
I pushed a sketch of what using an intrinsic instead would look like. What do you think? |
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the issue (#127499) you pointed out that this issue applies to the MSVC ABI where all parameters are destroyed in the callee. Is it reasonable to construct the analogous test case in that environment? I tried to do this locally, but can't for unrelated build configuration reasons.
My instinct reaction is, it may block some optimizations to optimize the alloca out. I don't feel very good about it. Is it possible to introduce the intrinsics without blocking optimizations?
I believe we shouldn't access the coroutine frame after coro.end. If the doc doesn't make it clear, let's try to make it. It is true that this may be fine with an optimization, but we don't need to care about that.
I do feel it is wrong since it accesses the frame after it ends. It makes no sense.
|
Yes, in fact the reproducer on that issue will trigger use-after-free on x86-64 Windows also without the I've updated coro-params.cpp to cover this as well.
The intrinsic gets dropped by CoroSplit, so after that it's not a problem. Before that, I'm not sure much interesting optimization can happen to allocas anyway? Especially since it's not known if they will be part of the coro frame or not at that point.
How would we motivate that change though? Also, would that solve the GRO issue (#49843) which led to adding the existing metadata?
(To make it easier to follow, we're talking about this access.) Looking closer, I think that code might be fine. The code after |
As far as I know, a lot of allocas related optimizations/transformation happens before CoroSplit. e.g., mem2reg. And this is the reason why we put CoroSplit in the middle of passes.
I think so.
On the one hand, I still suspect to the validness of the old test. On the other hand, I think we should improve our algorithm to identify the area after coro.end as special area when calculating whether or not to put an alloca to the frame. The short term solution may consider the case the coroutine won't suspend specially. |
You're right. I hadn't realized that we run mem2reg before CoroSplit and then spill values back to the frame as needed. I suppose mem2reg is especially relevant since it operates on allocas. If the transform could remove an alloca, we should not let the intrinsic block it. I've updated the PR to handle that. For other optimizations that analyze uses of the alloca, I've marked the intrinsic I think this way the intrinsic will not block important optimizations.
I think until someone points out exactly what's wrong about the old test, we have to assume that it's valid.
Can you explain this more? |
I think if we're trying to automatically identify whether an alloca has one of two lifetimes, and one of those lifetimes isn't a subset of the other, we're guaranteed to run into trouble again, eventually. We have to pick the right lifetime 100% of the time... but without assistance from the frontend, it's impossible to write an algorithm that doesn't give up in some cases, given a pointer that escapes. So it's not really a question of "improving" the existing algorithm; we need markers in the IR, like coro_outside_frame, and the algorithm should be based on that. |
Re: sroa/mem2reg, that's a valid concern with Hans's intrinsic approach.
To check my understanding, by explicit markers, you mean first class fields of Going back to the beginning, why do we end up in a UAF situation? IIUC, the ramp function should do the following:
What is coro split doing that breaks this? It sounds like it's rewriting the second destructor to destroy the variable in the frame, but that's incorrect. I guess the issue is that the coroutine frame is not modelled in IR during frontend lowering. It's implied that all allocas get moved into the frame, except in two cases, apparently. What's different between the case where we suspend and the case where there are no suspend points? |
Is it still a concern in the latest version, which makes mem2reg aware of the intrinsic? I don't have a good feel for whether putting a flag on
The problem is that the "parameter alloca" (first bullet point) incorrectly gets "promoted" to the coroutine frame. CoroSplit assumes it may be accessed across suspension points (I think it has to, as the address escapes via the "src" argument in the move ctor call).
The parameter's destructor runs at the end of the ramp function (last bullet point). If we're suspending, the coro frame is still alive at this point (for use when resuming). If the coro finished without suspending, the coro frame has been destroyed at this point, and we get UAF. |
The discussion seems to be stalling. I'll try to summarize. There seem to be two major questions:
I'm keen to unbreak our users here, as well as implementing the proper long-term solution. Since there is some disagreement, maybe we shouldn't do it all at once. I've rolled back my PR to just use the existing metadata. While that's not a complete solution, it should be a strict improvement: it fixes the immediate issue using the existing mechanism. I'd like to go ahead and land this. I will follow up with a PR to remove the metadata, and we can continue discussing there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think correctness comes first, so this prioritization makes sense.
clang/lib/CodeGen/CGCoroutine.cpp
Outdated
@@ -855,6 +855,16 @@ void CodeGenFunction::EmitCoroutineBody(const CoroutineBodyStmt &S) { | |||
// Create parameter copies. We do it before creating a promise, since an | |||
// evolution of coroutine TS may allow promise constructor to observe | |||
// parameter copies. | |||
for (const ParmVarDecl *Parm : FnArgs) { | |||
// If the original param is in an alloca, exclude it from the coroutine | |||
// frame. The parameter copy will be part of the frame. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please elaborate in this comment:
// frame. The parameter copy will be part of the frame. The parameter copy
// will live on the heap, but the original parameter memory should remain on
// the stack. This is necessary to ensure that parameters destroyed in callees, as
// with `trivial_abi` or in the MSVC C++ ABI, are appropriately destroyed after
// setting up the coroutine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
void consume(int) noexcept; | ||
|
||
// Similarly to the [[clang::trivial_abi]] parameters, with the MSVC ABI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is a good edge case to motivate generalizing this for all parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abstractly the current one looks good to stop the bleeding first.
But for the issue itself, I still feel we can fix it in the middle end. Since the case a coroutine without a suspend point is relatively rare case to me.
For what it's worth, this issue isn't limited to coroutines without any suspend point. It can also occur:
|
Or we can model it as, if it is possible to reach the section after coro.end from entry without reaching a real suspend point. |
Even for a real suspend, though, isn't it still possible for the coroutine to be resumed on a different thread, completed, and the frame deleted, all before the coroutine ramp finishes executing on the current thread? |
Technically possible, you got me. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/162/builds/17051 Here is the relevant piece of the build log for the reference
|
Metadata should not be "load bearing", i.e. required for correctness, since optimizations are not required to preserve it. Therefore, turn this into an intrinsic instead. This is follow-up to llvm#127653.
…lvm#127653) Parameters to a coroutine get copied (moved) to coroutine-local instances which code inside the coroutine then uses. The original parameters should not be part of the frame. Normally CoroSplit figures that out by itself, but for [[clang::trivial_abi]] parameters which, get destructed at the end of the ramp function, it does not (see bug), causing use-after-free's if the frame is destroyed before the end of the ramp (as happens if it doesn't suspend). Since Clang knows these should never be part of the frame, use metadata to make it so. Fixes llvm#127499
Parameters to a coroutine get copied (moved) to coroutine-local instances which code inside the coroutine then uses.
The original parameters should not be part of the frame. Normally CoroSplit figures that out by itself, but for [[clang::trivial_abi]] parameters which, get destructed at the end of the ramp function, it does not (see bug), causing use-after-free's if the frame is destroyed before the end of the ramp (as happens if it doesn't suspend).
Since Clang knows these should never be part of the frame, use metadata to make it so.
Fixes #127499