Skip to content

Reland: [llvm][clang] Allocate a new stack instead of spawning a new thread to get more stack space #136046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
[llvm][clang] Allocate a new stack instead of spawning a new thread t…
…o get more stack space

Clang spawns a new thread to avoid running out of stack space. This
can make debugging and performance analysis more difficult as how the
threads are connected is difficult to recover.

This patch introduces `runOnNewStack` and applies it in Clang. On
platforms that have good support for it this allocates a new stack and
moves to it using assembly. Doing split stacks like this actually runs
on most platforms, but many debuggers and unwinders reject the large
or backwards stack offsets that occur. Apple platforms and tools are
known to support this, so this only enables it there for now.
  • Loading branch information
Bigcheese committed Apr 16, 2025
commit 2c44bbbf944ec9e2d15c431342a835ef1c7ed18c
4 changes: 4 additions & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,10 @@ Non-comprehensive list of changes in this release
- Added `__builtin_elementwise_exp10`.
- For AMDPGU targets, added `__builtin_v_cvt_off_f32_i4` that maps to the `v_cvt_off_f32_i4` instruction.
- Added `__builtin_elementwise_minnum` and `__builtin_elementwise_maxnum`.
- Clang itself now uses split stacks instead of threads for allocating more
stack space when running on Apple AArch64 based platforms. This means that
stack traces of Clang from debuggers, crashes, and profilers may look
different than before.

New Compiler Flags
------------------
Expand Down
5 changes: 4 additions & 1 deletion clang/include/clang/Basic/Stack.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,10 @@ namespace clang {

/// Call this once on each thread, as soon after starting the thread as
/// feasible, to note the approximate address of the bottom of the stack.
void noteBottomOfStack();
///
/// \param ForceSet set to true if you know the call is near the bottom of a
/// new stack. Used for split stacks.
void noteBottomOfStack(bool ForceSet = false);

/// Determine whether the stack is nearly exhausted.
bool isStackNearlyExhausted();
Expand Down
40 changes: 12 additions & 28 deletions clang/lib/Basic/Stack.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,33 +13,13 @@

#include "clang/Basic/Stack.h"
#include "llvm/Support/CrashRecoveryContext.h"
#include "llvm/Support/ProgramStack.h"

#ifdef _MSC_VER
#include <intrin.h> // for _AddressOfReturnAddress
#endif
static LLVM_THREAD_LOCAL uintptr_t BottomOfStack = 0;

static LLVM_THREAD_LOCAL void *BottomOfStack = nullptr;

static void *getStackPointer() {
#if __GNUC__ || __has_builtin(__builtin_frame_address)
return __builtin_frame_address(0);
#elif defined(_MSC_VER)
return _AddressOfReturnAddress();
#else
char CharOnStack = 0;
// The volatile store here is intended to escape the local variable, to
// prevent the compiler from optimizing CharOnStack into anything other
// than a char on the stack.
//
// Tested on: MSVC 2015 - 2019, GCC 4.9 - 9, Clang 3.2 - 9, ICC 13 - 19.
char *volatile Ptr = &CharOnStack;
return Ptr;
#endif
}

void clang::noteBottomOfStack() {
if (!BottomOfStack)
BottomOfStack = getStackPointer();
void clang::noteBottomOfStack(bool ForceSet) {
if (!BottomOfStack || ForceSet)
BottomOfStack = llvm::getStackPointer();
}

bool clang::isStackNearlyExhausted() {
Expand All @@ -51,7 +31,8 @@ bool clang::isStackNearlyExhausted() {
if (!BottomOfStack)
return false;

intptr_t StackDiff = (intptr_t)getStackPointer() - (intptr_t)BottomOfStack;
intptr_t StackDiff =
(intptr_t)llvm::getStackPointer() - (intptr_t)BottomOfStack;
size_t StackUsage = (size_t)std::abs(StackDiff);

// If the stack pointer has a surprising value, we do not understand this
Expand All @@ -66,9 +47,12 @@ bool clang::isStackNearlyExhausted() {
void clang::runWithSufficientStackSpaceSlow(llvm::function_ref<void()> Diag,
llvm::function_ref<void()> Fn) {
llvm::CrashRecoveryContext CRC;
CRC.RunSafelyOnThread([&] {
noteBottomOfStack();
// Preserve the BottomOfStack in case RunSafelyOnNewStack uses split stacks.
uintptr_t PrevBottom = BottomOfStack;
CRC.RunSafelyOnNewStack([&] {
noteBottomOfStack(true);
Diag();
Fn();
}, DesiredStackSize);
BottomOfStack = PrevBottom;
}
2 changes: 1 addition & 1 deletion clang/lib/Frontend/CompilerInstance.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1265,7 +1265,7 @@ bool CompilerInstance::compileModule(SourceLocation ImportLoc,

// Execute the action to actually build the module in-place. Use a separate
// thread so that we get a stack large enough.
bool Crashed = !llvm::CrashRecoveryContext().RunSafelyOnThread(
bool Crashed = !llvm::CrashRecoveryContext().RunSafelyOnNewStack(
[&]() {
GenerateModuleFromModuleMapAction Action;
Instance.ExecuteAction(Action);
Expand Down
3 changes: 3 additions & 0 deletions llvm/include/llvm/Support/CrashRecoveryContext.h
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,9 @@ class CrashRecoveryContext {
return RunSafelyOnThread([&]() { Fn(UserData); }, RequestedStackSize);
}

bool RunSafelyOnNewStack(function_ref<void()>,
unsigned RequestedStackSize = 0);

/// Explicitly trigger a crash recovery in the current process, and
/// return failure from RunSafely(). This function does not return.
[[noreturn]] void HandleExit(int RetCode);
Expand Down
63 changes: 63 additions & 0 deletions llvm/include/llvm/Support/ProgramStack.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
//===--- ProgramStack.h -----------------------------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_SUPPORT_PROGRAMSTACK_H
#define LLVM_SUPPORT_PROGRAMSTACK_H

#include "llvm/ADT/STLFunctionalExtras.h"

// LLVM_HAS_SPLIT_STACKS is exposed in the header because CrashRecoveryContext
// needs to know if it's running on another thread or not.
//
// Currently only Apple AArch64 is known to support split stacks in the debugger
// and other tooling.
#if defined(__APPLE__) && defined(__MACH__) && defined(__aarch64__) && \
__has_extension(gnu_asm)
# define LLVM_HAS_SPLIT_STACKS
# define LLVM_HAS_SPLIT_STACKS_AARCH64
#endif

namespace llvm {

/// \returns an address close to the current value of the stack pointer.
///
/// The value is not guaranteed to point to anything specific. It can be used to
/// estimate how much stack space has been used since the previous call.
uintptr_t getStackPointer();

/// \returns the default stack size for this platform.
///
/// Based on \p RLIMIT_STACK or the equivalent.
unsigned getDefaultStackSize();

/// Runs Fn on a new stack of at least the given size.
///
/// \param StackSize requested stack size. A size of 0 uses the default stack
/// size of the platform.
///
/// The preferred implementation is split stacks on platforms that have a good
/// debugging experience for them. On other platforms a new thread is used.
void runOnNewStack(unsigned StackSize, function_ref<void()> Fn);

template <typename R, typename... Ts>
std::enable_if_t<!std::is_same_v<R, void>, R>
runOnNewStack(unsigned StackSize, function_ref<R(Ts...)> Fn, Ts &&...Args) {
std::optional<R> Ret;
runOnNewStack(StackSize, [&]() { Ret = Fn(std::forward<Ts>(Args)...); });
return std::move(*Ret);
}

template <typename... Ts>
void runOnNewStack(unsigned StackSize, function_ref<void(Ts...)> Fn,
Ts &&...Args) {
runOnNewStack(StackSize, [&]() { Fn(std::forward<Ts>(Args)...); });
}

} // namespace llvm

#endif // LLVM_SUPPORT_PROGRAMSTACK_H
1 change: 1 addition & 0 deletions llvm/include/llvm/Support/thread.h
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,7 @@ inline thread::id get_id() { return std::this_thread::get_id(); }

#else // !LLVM_ENABLE_THREADS

#include "llvm/Support/ErrorHandling.h"
#include <utility>

namespace llvm {
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Support/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,7 @@ add_llvm_component_library(LLVMSupport
Path.cpp
Process.cpp
Program.cpp
ProgramStack.cpp
RWMutex.cpp
Signals.cpp
Threading.cpp
Expand Down
11 changes: 11 additions & 0 deletions llvm/lib/Support/CrashRecoveryContext.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include "llvm/Config/llvm-config.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/ExitCodes.h"
#include "llvm/Support/ProgramStack.h"
#include "llvm/Support/Signals.h"
#include "llvm/Support/thread.h"
#include <cassert>
Expand Down Expand Up @@ -523,3 +524,13 @@ bool CrashRecoveryContext::RunSafelyOnThread(function_ref<void()> Fn,
CRC->setSwitchedThread();
return Info.Result;
}

bool CrashRecoveryContext::RunSafelyOnNewStack(function_ref<void()> Fn,
unsigned RequestedStackSize) {
#ifdef LLVM_HAS_SPLIT_STACKS
return runOnNewStack(RequestedStackSize,
function_ref<bool()>([&]() { return RunSafely(Fn); }));
#else
return RunSafelyOnThread(Fn, RequestedStackSize);
#endif
}
127 changes: 127 additions & 0 deletions llvm/lib/Support/ProgramStack.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
//===--- RunOnNewStack.cpp - Crash Recovery -------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#include "llvm/Support/ProgramStack.h"
#include "llvm/Config/config.h"
#include "llvm/Support/Compiler.h"

#ifdef LLVM_ON_UNIX
# include <sys/resource.h> // for getrlimit
#endif

#ifdef _MSC_VER
# include <intrin.h> // for _AddressOfReturnAddress
#endif

#ifndef LLVM_HAS_SPLIT_STACKS
# include "llvm/Support/thread.h"
#endif

using namespace llvm;

uintptr_t llvm::getStackPointer() {
#if __GNUC__ || __has_builtin(__builtin_frame_address)
return (uintptr_t)__builtin_frame_address(0);
#elif defined(_MSC_VER)
return (uintptr_t)_AddressOfReturnAddress();
#else
volatile char CharOnStack = 0;
// The volatile store here is intended to escape the local variable, to
// prevent the compiler from optimizing CharOnStack into anything other
// than a char on the stack.
//
// Tested on: MSVC 2015 - 2019, GCC 4.9 - 9, Clang 3.2 - 9, ICC 13 - 19.
char *volatile Ptr = &CharOnStack;
return (uintptr_t)Ptr;
#endif
}

unsigned llvm::getDefaultStackSize() {
#ifdef LLVM_ON_UNIX
rlimit RL;
getrlimit(RLIMIT_STACK, &RL);
return RL.rlim_cur;
#else
// Clang recursively parses, instantiates templates, and evaluates constant
// expressions. We've found 8MiB to be a reasonable stack size given the way
// Clang works and the way C++ is commonly written.
return 8 << 20;
#endif
}

// Not an anonymous namespace to avoid warning about undefined local function.
namespace llvm {
#ifdef LLVM_HAS_SPLIT_STACKS_AARCH64
void runOnNewStackImpl(void *Stack, void (*Fn)(void *), void *Ctx) __asm__(
"_ZN4llvm17runOnNewStackImplEPvPFvS0_ES0_");

// This can't use naked functions because there is no way to know if cfi
// directives are being emitted or not.
//
// When adding new platforms it may be better to move to a .S file with macros
// for dealing with platform differences.
__asm__ (
".globl _ZN4llvm17runOnNewStackImplEPvPFvS0_ES0_\n\t"
".p2align 2\n\t"
"_ZN4llvm17runOnNewStackImplEPvPFvS0_ES0_:\n\t"
".cfi_startproc\n\t"
"mov x16, sp\n\t"
"sub x0, x0, #0x20\n\t" // subtract space from stack
"stp xzr, x16, [x0, #0x00]\n\t" // save old sp
"stp x29, x30, [x0, #0x10]\n\t" // save fp, lr
"mov sp, x0\n\t" // switch to new stack
"add x29, x0, #0x10\n\t" // switch to new frame
".cfi_def_cfa w29, 16\n\t"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these appear immediately after the stack switch, since as soon as you update SP, the values from the parent frame can no longer be recovered? This will fix stack unwinding when single-stepping through this code.

".cfi_offset w30, -8\n\t" // lr
".cfi_offset w29, -16\n\t" // fp

"mov x0, x2\n\t" // Ctx is the only argument
"blr x1\n\t" // call Fn

"ldp x29, x30, [sp, #0x10]\n\t" // restore fp, lr
"ldp xzr, x16, [sp, #0x00]\n\t" // load old sp
"mov sp, x16\n\t"
"ret\n\t"
".cfi_endproc"
);
#endif
} // namespace llvm

namespace {
#ifdef LLVM_HAS_SPLIT_STACKS
void callback(void *Ctx) {
(*reinterpret_cast<function_ref<void()> *>(Ctx))();
}
#endif
} // namespace

#ifdef LLVM_HAS_SPLIT_STACKS
void llvm::runOnNewStack(unsigned StackSize, function_ref<void()> Fn) {
if (StackSize == 0)
StackSize = getDefaultStackSize();

// We use malloc here instead of mmap because:
// - it's simpler,
// - many malloc implementations will reuse the allocation in cases where
// we're bouncing accross the edge of a stack boundry, and
// - many malloc implemenations will already provide guard pages for
// allocations this large.
void *Stack = malloc(StackSize);
void *BottomOfStack = (char *)Stack + StackSize;

runOnNewStackImpl(BottomOfStack, callback, &Fn);

free(Stack);
}
#else
void llvm::runOnNewStack(unsigned StackSize, function_ref<void()> Fn) {
llvm::thread Thread(
StackSize == 0 ? std::nullopt : std::optional<unsigned>(StackSize), Fn);
Thread.join();
}
#endif
1 change: 1 addition & 0 deletions llvm/unittests/Support/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ add_llvm_unittest(SupportTests
PerThreadBumpPtrAllocatorTest.cpp
ProcessTest.cpp
ProgramTest.cpp
ProgramStackTest.cpp
RecyclerTest.cpp
RegexTest.cpp
ReverseIterationTest.cpp
Expand Down
35 changes: 35 additions & 0 deletions llvm/unittests/Support/ProgramStackTest.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
//===- unittest/Support/ProgramStackTest.cpp ------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#include "llvm/Support/ProgramStack.h"
#include "llvm/Support/Process.h"
#include "gtest/gtest.h"

using namespace llvm;

static uintptr_t func(int &A) {
A = 7;
return getStackPointer();
}

static void func2(int &A) {
A = 5;
}

TEST(ProgramStackTest, runOnNewStack) {
int A = 0;
uintptr_t Stack = runOnNewStack(0, function_ref<uintptr_t(int &)>(func), A);
EXPECT_EQ(A, 7);
intptr_t StackDiff = (intptr_t)llvm::getStackPointer() - (intptr_t)Stack;
size_t StackDistance = (size_t)std::abs(StackDiff);
// Page size is used as it's large enough to guarantee were not on the same
// stack but not too large to cause spurious failures.
EXPECT_GT(StackDistance, llvm::sys::Process::getPageSizeEstimate());
runOnNewStack(0, function_ref<void(int &)>(func2), A);
EXPECT_EQ(A, 5);
}
Loading