Skip to content

[bug] clang incorrectly uses coroutine frame for scratch space after suspending (take 2) #65054

Closed
@jacobsa

Description

@jacobsa

Starting a new thread for this, because the previous threads have become long and confused by rollbacks and cherrypicks. Previous history here:


I've found another case where clang incorrectly writes to the coroutine frame after suspending, introducing a data race. To recap, clang must not write to the coroutine frame after suspending, because another thread may have already resumed/destroyed the coroutine by the time the write happens.

Here is the program, a slight variant of the one in #65018 (with a fix for the "caller always leaks" issue discussed in #65030):

#include <coroutine>

// A simple awaiter type with an await_suspend method that can't be
// inlined.
struct Awaiter {
  const int& x;

  bool await_ready() { return false; }
  std::coroutine_handle<> await_suspend(const std::coroutine_handle<> h);
  void await_resume() {}
};

struct MyTask {
  // A lazy promise with an await_transform method that supports awaiting
  // integer references using the Awaiter struct above.
  struct promise_type {
    MyTask get_return_object() {
      return MyTask{
          std::coroutine_handle<promise_type>::from_promise(*this),
      };
    }

    std::suspend_always initial_suspend() { return {}; }
    std::suspend_always final_suspend() noexcept { return {}; }
    void unhandled_exception();

    auto await_transform(const int& x) { return Awaiter{x}; }
  };

  std::coroutine_handle<> h;
};

// A global array of integers.
int g_array[32];

// A coroutine that awaits each integer in the global array.
MyTask FooBar() {
  for (const int& x : g_array) {
    co_await x;
  }
}

(Compiler Explorer)

Compiler Explorer isn't yet up to date with recent commits, but I built clang manually at b32aa72 and then ran ./bin/clang -std=c++20 -O0 -S -masm=intel foo.cc. This gives me the following output:

  ; (In the preamble before modifying rdi)
  ;
  ; Stash the address 64 bytes into the coroutine frame in the local thread's
  ; stack.
  mov	rax, rdi
  add	rax, 64
  mov	qword ptr [rbp - 96], rax       # 8-byte Spill
[...]

  ; Call await_suspend, then store its result on the local thread's stack.
  call	_ZN7Awaiter13await_suspendENSt7__n486116coroutine_handleIvEE@PLT
.Ltmp16:
  mov	qword ptr [rbp - 200], rax      # 8-byte Spill
  jmp	.LBB19_17

.LBB19_17:
  ; Reload `&coroutine_frame + 64` into rdi.
  mov	rdi, qword ptr [rbp - 96]       # 8-byte Reload
  
  ; Reload the result of await_suspend into rax.
  mov	rax, qword ptr [rbp - 200]      # 8-byte Reload
  
  ; Write the result of await_suspend to `&coroutine_frame + 64`.
  ; ------> This is the bug!
  mov	qword ptr [rdi], rax
  
  ; Call std::coroutine_handle::address, handing it a `this` pointer that points
  ; at the coroutine frame. Use the result as an indirect jump target.
  call	_ZNKSt7__n486116coroutine_handleIvE7addressEv
  mov	rdi, rax
  mov	rax, qword ptr [rax]
  add	rsp, 256
  pop	rbp
  .cfi_def_cfa rsp, 8
  jmp	rax                             # TAILCALL

@ChuanqiXu9

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions