Skip to content

[bug] clang incorrectly uses coroutine frame for scratch space after suspending (take 3) #72006

Closed
@jacobsa

Description

@jacobsa

This issue is related to #65018 and #65054, where clang incorrectly uses a coroutine frame for scratch space after the coroutine suspended. To recap: clang must not insert code that accesses the coroutine frame after it suspends, because this may introduce a data race. Another thread may already have resumed or destroyed the coroutine by the time this thread gets around to accessing its frame.

The immediate symptoms in those two issues were fixed by @ChuanqiXu9, by manipulating whether clang will inline std::coroutine_handle::address. But as far as I'm aware there was no systematic fix to teach clang that it's not allowed to access the frame after suspending. I think the new issue here is a consequence of this.


With a large piece of internal Google code we've found that AddressSanitizer sometimes reports an incorrect use-after-free error in coroutine code, claiming that the frame is being written to after it was already destroyed by another thread that ran concurrently after it suspended.

I don't have a nicely packaged version of the original code to share, but I believe the reproducer from #65018 also shows this issue with the appropriate build settings. Here is the code again, containing a function that uses a foreign await_suspend method with symmetric transfer of control to await each element of an array of 32 integers:

#include <coroutine>

// A simple awaiter type with an await_suspend method that can't be
// inlined.
struct Awaiter {
  const int& x;

  bool await_ready() { return false; }
  std::coroutine_handle<> await_suspend(const std::coroutine_handle<> h);
  void await_resume() {}
};

struct MyTask {
  // A lazy promise with an await_transform method that supports awaiting
  // integer references using the Awaiter struct above.
  struct promise_type {
    MyTask get_return_object() { return {}; }
    std::suspend_always initial_suspend() { return {}; }
    std::suspend_always final_suspend() noexcept { return {}; }
    void unhandled_exception();

    auto await_transform(const int& x) { return Awaiter{x}; }
  };
};

// A global array of integers.
int g_array[32];

// A coroutine that awaits each integer in the global array.
MyTask FooBar() {
  for (const int& x : g_array) {
    co_await x;
  }
}

If you build this on armv8-a clang trunk with -std=c++20 -fno-exceptions -fsanitize=address -fsanitize=undefined -fno-sanitize=nonnull-attribute -fno-sanitize=vptr -fno-sanitize=function -fno-sanitize=enum -fno-sanitize-trap=all -O0 -emit-llvm (Compiler Explorer), you get this assembly as part of the resume function for FooBar (full dump):

        bl      Awaiter::await_suspend(std::__n4861::coroutine_handle<void>)
        ldr     x9, [x19, #536]                 // 8-byte Folded Reload
        str     x0, [x19, #240]                 // 8-byte Folded Spill
        str     x9, [x19, #248]                 // 8-byte Folded Spill
        mov     x8, #68719476736                // =0x1000000000
        add     x8, x8, x9, lsr #3
        ldrb    w8, [x8]
        cbz     w8, .LBB16_52
        b       .LBB16_51
.LBB16_51:
        ldr     x0, [x19, #248]                 // 8-byte Folded Reload
        bl      __asan_report_store8

This shows that after suspension asan is being told that there is a store to an address loaded from x19+536, which is something earlier spilled to the stack. What's going on is clearer with -emit-llvm (full dump, just FooBar.resume):

  %call29 = call i64 @Awaiter::await_suspend(std::__n4861::coroutine_handle<void>)(ptr noundef nonnull align 8 dereferenceable(8) %ref.tmp11.reload.addr, i64 %coerce.val.pi28), !dbg !329
  %coerce.dive30 = getelementptr inbounds %"struct.std::__n4861::coroutine_handle", ptr %ref.tmp18.reload.addr, i32 0, i32 0, !dbg !329
  %coerce.val.ip31 = inttoptr i64 %call29 to ptr, !dbg !329
  %144 = ptrtoint ptr %coerce.dive30 to i64, !dbg !329
  %145 = lshr i64 %144, 3, !dbg !329
  %146 = add i64 %145, 68719476736, !dbg !329
  %147 = inttoptr i64 %146 to ptr, !dbg !329
  %148 = load i8, ptr %147, align 1, !dbg !329
  %149 = icmp ne i8 %148, 0, !dbg !329
  br i1 %149, label %150, label %151, !dbg !329

150:
  call void @__asan_report_store8(i64 %144) #10, !dbg !329
  unreachable, !dbg !329

151:
  store ptr %coerce.val.ip31, ptr %coerce.dive30, align 8, !dbg !329

There is a store to %coerce.dive30, which is a pointer to a std::coroutine_handle on the coroutine frame:

%FooBar().Frame = type { ptr, ptr, %"struct.MyTask::promise_type", i2, %"struct.std::__n4861::suspend_always", %"struct.std::__n4861::suspend_always", ptr, ptr, ptr, ptr, %struct.Awaiter, %"struct.std::__n4861::coroutine_handle", %"struct.std::__n4861::coroutine_handle.0" }
[...]
  %ref.tmp18.reload.addr = getelementptr inbounds %FooBar().Frame, ptr %0, i32 0, i32 11, !dbg !320

In other words, with these build settings clang still incorrectly dumps a coroutine handle to the coroutine frame after it suspends.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions