Skip to content

[ASan][test] Fix TestCases/Posix/stack-overflow.cpp on Solaris/sparcv9 #109101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 24, 2024

Conversation

rorth
Copy link
Collaborator

@rorth rorth commented Sep 18, 2024

When ASan testing is enabled on SPARC as per PR #107405, the

  AddressSanitizer-sparc-sunos :: TestCases/Posix/stack-overflow.cpp

test FAILs:

compiler-rt/test/asan/TestCases/Posix/stack-overflow.cpp:80:12: error: CHECK: expected string not found in input
 // CHECK: {{stack-overflow on address 0x.* \(pc 0x.* bp 0x.* sp 0x.* T.*\)}}
           ^
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
=================================================================
==11358==ERROR: AddressSanitizer: SEGV on unknown address 0xff3fff90 (pc 0x000db0c0 bp 0xfeed59f8 sp 0xfeed5978 T0)
==11358==The signal is caused by a READ memory access.
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.

It turns out that sanitizer_linux.cpp (GetPcSpBp) tries to dereference the stack pointer to get at the saved frame pointer, which cannot work since sp has been invalidated by the stack overflow in the test. The access attempt thus leads to a second SEGV.

Solaris walkcontext(3C) doesn't have that problem: in the original OpenSolaris sources ($SRC/lib/libc/port/gen/walkstack.c) they used /proc/self/as to avoid the fault, which is quite heavy-handed. Solaris 11.4 uses a non-faulting load instead (load_no_fault_uint32, which just uses the lduwa insn).

This patch follows this lead, returning a NULL bp in the failure case. Unfortunately, this leads to SEGVs in the depth of the unwinder, so this patch avoids printing a stack trace in this case.

Tested on sparcv9-sun-solaris2.11 and sparc64-unknown-linux-gnu.

When ASan testing is enabled on SPARC as per PR llvm#107405, the
```
  AddressSanitizer-sparc-sunos :: TestCases/Posix/stack-overflow.cpp
```
test `FAIL`s:
```
compiler-rt/test/asan/TestCases/Posix/stack-overflow.cpp:80:12: error: CHECK: expected string not found in input
 // CHECK: {{stack-overflow on address 0x.* \(pc 0x.* bp 0x.* sp 0x.* T.*\)}}
           ^
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
=================================================================
==11358==ERROR: AddressSanitizer: SEGV on unknown address 0xff3fff90 (pc 0x000db0c0 bp 0xfeed59f8 sp 0xfeed5978 T0)
==11358==The signal is caused by a READ memory access.
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.
```
It turns out that `sanitizer_linux.cpp` (`GetPcSpBp`) tries to dereference
the stack pointer to get at the saved frame pointer, which cannot work since
`sp` has been invalidated by the stack overflow in the test.  The access
attempt thus leads to a second `SEGV`.

Solaris `walkcontext(3C)` doesn't have that problem: in the original
OpenSolaris sources (`$SRC/lib/libc/port/gen/walkstack.c`) they used
`/proc/self/as` to avoid the fault, which is quite heavy-handed.  Solaris
11.4 uses a non-faulting load instead (`load_no_fault_uint32`, which just
uses the `lduwa` insn).

This patch follows this lead, returning a `NULL` `bp` in the failure case.
Unfortunately, this leads to `SEGV`s in the depth of the unwinder, so this
patch avoids printing a stack trace in this case.

Tested on `sparcv9-sun-solaris2.11` and `sparc64-unknown-linux-gnu`.
@llvmbot
Copy link
Member

llvmbot commented Sep 18, 2024

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Rainer Orth (rorth)

Changes

When ASan testing is enabled on SPARC as per PR #107405, the

  AddressSanitizer-sparc-sunos :: TestCases/Posix/stack-overflow.cpp

test FAILs:

compiler-rt/test/asan/TestCases/Posix/stack-overflow.cpp:80:12: error: CHECK: expected string not found in input
 // CHECK: {{stack-overflow on address 0x.* \(pc 0x.* bp 0x.* sp 0x.* T.*\)}}
           ^
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
=================================================================
==11358==ERROR: AddressSanitizer: SEGV on unknown address 0xff3fff90 (pc 0x000db0c0 bp 0xfeed59f8 sp 0xfeed5978 T0)
==11358==The signal is caused by a READ memory access.
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.

It turns out that sanitizer_linux.cpp (GetPcSpBp) tries to dereference the stack pointer to get at the saved frame pointer, which cannot work since sp has been invalidated by the stack overflow in the test. The access attempt thus leads to a second SEGV.

Solaris walkcontext(3C) doesn't have that problem: in the original OpenSolaris sources ($SRC/lib/libc/port/gen/walkstack.c) they used /proc/self/as to avoid the fault, which is quite heavy-handed. Solaris 11.4 uses a non-faulting load instead (load_no_fault_uint32, which just uses the lduwa insn).

This patch follows this lead, returning a NULL bp in the failure case. Unfortunately, this leads to SEGVs in the depth of the unwinder, so this patch avoids printing a stack trace in this case.

Tested on sparcv9-sun-solaris2.11 and sparc64-unknown-linux-gnu.


Full diff: https://github.com/llvm/llvm-project/pull/109101.diff

2 Files Affected:

  • (modified) compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp (+16-2)
  • (modified) compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_report.cpp (+9-6)
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
index 6359f4348e3c48..1da3ce763302e3 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
@@ -107,7 +107,9 @@ extern struct ps_strings *__ps_strings;
 #  endif  // SANITIZER_NETBSD
 
 #  if SANITIZER_SOLARIS
+#    include <stddef.h>
 #    include <stdlib.h>
+#    include <sys/frame.h>
 #    include <thread.h>
 #    define environ _environ
 #  endif
@@ -2617,7 +2619,19 @@ static void GetPcSpBp(void *context, uptr *pc, uptr *sp, uptr *bp) {
 #    if SANITIZER_SOLARIS
   ucontext_t *ucontext = (ucontext_t *)context;
   *pc = ucontext->uc_mcontext.gregs[REG_PC];
-  *sp = ucontext->uc_mcontext.gregs[REG_O6] + STACK_BIAS;
+  *sp = ucontext->uc_mcontext.gregs[REG_SP] + STACK_BIAS;
+  // Avoid SEGV when dereferencing sp on stack overflow with non-faulting load.
+  // This requires a SPARC V9 CPU.  Cannot use #ASI_PNF here: only supported
+  // since clang-19.
+#      if defined(__sparcv9)
+  asm("ldxa [%[fp]] 0x82, %[bp]"
+#      else
+  asm("lduwa [%[fp]] 0x82, %[bp]"
+#      endif
+      : [bp] "=r"(*bp)
+      : [fp] "r"(&((struct frame *)*sp)->fr_savfp));
+  if (*bp)
+    *bp += STACK_BIAS;
 #    else
   // Historical BSDism here.
   struct sigcontext *scontext = (struct sigcontext *)context;
@@ -2628,8 +2642,8 @@ static void GetPcSpBp(void *context, uptr *pc, uptr *sp, uptr *bp) {
   *pc = scontext->si_regs.pc;
   *sp = scontext->si_regs.u_regs[14];
 #      endif
-#    endif
   *bp = (uptr)((uhwptr *)*sp)[14] + STACK_BIAS;
+#    endif
 #  elif defined(__mips__)
   ucontext_t *ucontext = (ucontext_t *)context;
   *pc = ucontext->uc_mcontext.pc;
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_report.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_report.cpp
index ffbaf1468ec8ff..351e00db6fb2dc 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_report.cpp
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_report.cpp
@@ -227,12 +227,15 @@ static void ReportStackOverflowImpl(const SignalContext &sig, u32 tid,
          SanitizerToolName, kDescription, (void *)sig.addr, (void *)sig.pc,
          (void *)sig.bp, (void *)sig.sp, tid);
   Printf("%s", d.Default());
-  InternalMmapVector<BufferedStackTrace> stack_buffer(1);
-  BufferedStackTrace *stack = stack_buffer.data();
-  stack->Reset();
-  unwind(sig, unwind_context, stack);
-  stack->Print();
-  ReportErrorSummary(kDescription, stack);
+  // Avoid SEGVs in the unwinder when bp couldn't be determined.
+  if (sig.bp) {
+    InternalMmapVector<BufferedStackTrace> stack_buffer(1);
+    BufferedStackTrace *stack = stack_buffer.data();
+    stack->Reset();
+    unwind(sig, unwind_context, stack);
+    stack->Print();
+    ReportErrorSummary(kDescription, stack);
+  }
 }
 
 static void ReportDeadlySignalImpl(const SignalContext &sig, u32 tid,

@rorth
Copy link
Collaborator Author

rorth commented Sep 18, 2024

@ebotcazou

I've spent quite some time to make this work on Linux/sparc64, too, but ultimately failed for a couple of reasons:

  • The lduwa/ldxa insns require a SPARC V9 CPU. This is the only one Solaris runs on, so no issue there, while Linux is different: there are distros supporting V8 (e.g. LEON Linux), so this would need something like [Driver][Sparc] Default to -mcpu=v9 for SparcV8 on Linux which got stalled 2 years ago. I've got an updated version ready, but am uncertain there's a chance of getting this in.
  • Besides, I got all sorts of weird fault addresses SignalContext::IsStackOverflow returns false for (sometimes 0x0, sometimes addresses on the first page), so it didn't seem possible to reliably detect a stack overflow. I have no idea if that's related to the kernel passing a sigcontext * as third arg to signal handles on 32-bit Linux/sparc64 (and apparently the usual ucontext_t * for 64-bit, which has more info and could use the ucontext.ss_sp element to get at the stack limit).

This got all so weird and frustrating that I decided to leave this to someone with more Linux knowledge in order not to stall the Solaris fix any longer.

rorth added a commit to rorth/llvm-project that referenced this pull request Sep 19, 2024
While working on supporting PR llvm#109101 on Linux/sparc64, I was reminded
that `clang -m32` still defaults to generating V8 code, although the 64-bit
kernel requires a V9 CPU.

This patch corrects that on V9-only distros (Debian, Gentoo).

Tested on `sparc64-unknown-linux-gnu`, `x86_64-pc-linux-gnu`,
`sparcv9-sun-solaris2.11`, and `amd64-pc-solaris2.11`.

A previous version of this patch was submitted as [[Driver][Sparc] Default
to -mcpu=v9 for SparcV8 on Linux](https://reviews.llvm.org/D130688), but
got stalled for 2+ years.
rorth added a commit that referenced this pull request Sep 21, 2024
While working on supporting PR #109101 on Linux/sparc64, I was reminded
that `clang -m32` still defaults to generating V8 code, although the
64-bit kernel requires a V9 CPU.

This patch corrects that.

Tested on `sparc64-unknown-linux-gnu`, `x86_64-pc-linux-gnu`,
`sparcv9-sun-solaris2.11`, and `amd64-pc-solaris2.11`.
@rorth rorth merged commit 1493c24 into llvm:main Sep 24, 2024
10 checks passed
rorth added a commit to rorth/llvm-project that referenced this pull request Sep 24, 2024
When enabling ASan SPARC testing as per PR llvm#107405, 3 stack overflow tests
`FAIL` on Linux/sparc64:
```
  AddressSanitizer-sparc-linux :: TestCases/Linux/stack-overflow-recovery-mode.cpp
  AddressSanitizer-sparc-linux :: TestCases/Linux/stack-overflow-sigbus.cpp
  AddressSanitizer-sparc-linux :: TestCases/Posix/stack-overflow.cpp
  AddressSanitizer-sparc-linux-dynamic :: TestCases/Linux/stack-overflow-recovery-mode.cpp
  AddressSanitizer-sparc-linux-dynamic :: TestCases/Linux/stack-overflow-sigbus.cpp
  AddressSanitizer-sparc-linux-dynamic :: TestCases/Posix/stack-overflow.cpp
```
However, as detailed in Issue llvm#109771, even a Linux equivalent of the
Solaris/sparcv9 fix (PR llvm#109101) doesn't improve the situation.

Therefore this patch `XFAIL`s the tests until the root cause can be figured
out.

Tested on `sparc64-unknown-linux-gnu`, `sparcv9-sun-solaris2.11`, and
`x86_64-pc-linux-gnu`.
rorth added a commit that referenced this pull request Sep 25, 2024
When enabling ASan SPARC testing as per PR #107405, 3 stack overflow
tests `FAIL` on Linux/sparc64:
```
  AddressSanitizer-sparc-linux :: TestCases/Linux/stack-overflow-recovery-mode.cpp
  AddressSanitizer-sparc-linux :: TestCases/Linux/stack-overflow-sigbus.cpp
  AddressSanitizer-sparc-linux :: TestCases/Posix/stack-overflow.cpp
  AddressSanitizer-sparc-linux-dynamic :: TestCases/Linux/stack-overflow-recovery-mode.cpp
  AddressSanitizer-sparc-linux-dynamic :: TestCases/Linux/stack-overflow-sigbus.cpp
  AddressSanitizer-sparc-linux-dynamic :: TestCases/Posix/stack-overflow.cpp
```
However, as detailed in Issue #109771, even a Linux equivalent of the
Solaris/sparcv9 fix (PR #109101) doesn't improve the situation.

Therefore this patch `XFAIL`s the tests until the root cause can be
figured out.

Tested on `sparc64-unknown-linux-gnu`, `sparcv9-sun-solaris2.11`, and
`x86_64-pc-linux-gnu`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants