Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompiled Linux releases segfault on wasm file #2273

Closed
alexcrichton opened this issue Jul 31, 2019 · 8 comments · Fixed by #2274
Closed

Precompiled Linux releases segfault on wasm file #2273

alexcrichton opened this issue Jul 31, 2019 · 8 comments · Fixed by #2274

Comments

@alexcrichton
Copy link
Contributor

I can't seem to reproduce this when building locally from source, but the precompiled releases are segfaulting when executed over some wasm files locally on Linux. This was originally reported at rustwasm/wasm-pack#696 and can be reproduced by downloading the latest Linux binary release and running wasm-opt -O1 input.wasm -o /dev/null over this wasm file: wasmboypluginhqx_bg.wasm.gz

When I reproduce the build process for the Linux releases locally I get a build which sometimes works and sometimes doesn't (most of it doesn't). I've at least ruled out the strip step as the binaries coming out of make are the ones which segfault:

$ ./bin/wasm-opt -O1 $HOME/code/wasm-pack/wut/binaryen-version_78/wasmboypluginhqx_bg.wasm
zsh: abort (core dumped)  ./bin/wasm-opt -O1

The stack trace for the abort is:

Thread 23 "wasm-opt" received signal SIGABRT, Aborted.
[Switching to LWP 15715]
__restore_sigs (set=set@entry=0x7ffff7e16000) at ./arch/x86_64/syscall_arch.h:40
40      ./arch/x86_64/syscall_arch.h: No such file or directory.
(gdb) bt
#0  __restore_sigs (set=set@entry=0x7ffff7e16000) at ./arch/x86_64/syscall_arch.h:40
#1  0x00000000007dd1ce in raise (sig=sig@entry=6) at src/signal/raise.c:11
#2  0x00000000007db2a2 in abort () at src/exit/abort.c:14
#3  0x00000000007d8a6b in uw_init_context_1 (context=context@entry=0x7ffff7e162a0, outer_cfa=outer_cfa@entry=0x7ffff7e16650, outer_ra=
    0x781f6d <__cxxabiv1::__cxa_throw(void*, std::type_info*, void (*)(void*))+66>) at /home/buildozer/aports/main/gcc/src/gcc-8.3.0/libgcc/unwind-dw2.c:1587
#4  0x00000000007d8eac in _Unwind_RaiseException (exc=exc@entry=0x999720) at /home/buildozer/aports/main/gcc/src/gcc-8.3.0/libgcc/unwind.inc:93
#5  0x0000000000781f6d in __cxxabiv1::__cxa_throw (obj=0x999740, tinfo=0x8a6098 <typeinfo for wasm::PrecomputingExpressionRunner::NonstandaloneException>, dest=0x0)
    at /home/buildozer/aports/main/gcc/src/gcc-8.3.0/libstdc++-v3/libsupc++/eh_throw.cc:90
#6  0x000000000064079f in wasm::PrecomputingExpressionRunner::trap(char const*) ()
#7  0x0000000000642980 in wasm::OverriddenVisitor<wasm::PrecomputingExpressionRunner, wasm::Flow>::visit(wasm::Expression*) ()
#8  0x00000000006474b9 in wasm::Precompute::visitExpression(wasm::Expression*) ()
#9  0x000000000064b84f in wasm::WalkerPass<wasm::PostWalker<wasm::Precompute, wasm::UnifiedExpressionVisitor<wasm::Precompute, void> > >::runOnFunction(wasm::PassRunner*, wasm::Module*, wasm::Function*) ()
#10 0x00000000004ce68a in wasm::PassRunner::runPassOnFunction(wasm::Pass*, wasm::Function*) ()
#11 0x00000000004ce844 in std::_Function_handler<wasm::ThreadWorkState (), wasm::PassRunner::run()::{lambda()#2}::operator()() const::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#12 0x000000000076a882 in wasm::Thread::mainLoop(void*) ()
#13 0x0000000000788662 in std::execute_native_thread_routine (__p=0x9acfc0) at /home/buildozer/aports/main/gcc/src/gcc-8.3.0/libstdc++-v3/src/c++11/thread.cc:80
#14 0x00000000007e1189 in start (p=<optimized out>) at src/thread/pthread_create.c:147
#15 0x00000000007e1e1e in __clone () at src/thread/x86_64/clone.s:21
Backtrace stopped: frame did not save the PC

This does sort of look build-related and all about unwinding, although I won't pretend to understand what is going on here and how this could be fixed.

@kripken
Copy link
Member

kripken commented Jul 31, 2019

Hmm, interesting... I ran this many times locally and didn't see a problem. I also ran diagnostics like BINARYEN_PASS_DEBUG, valgrind, etc. and it all seems fine.

But I can confirm that running the release binaries does hit this bug. It doesn't happen with BINARYEN_CORES=1 which disables multithreading, so maybe that's a clue. Indeed, with multiple cores valgrind shows "Conditional jump or move depends on uninitialized value(s)", which it doesn't do on my local builds.

From the stack trace, it throws an exception (which is normal in the Precompute pass, it indicates something is not precomputable), but somehow the c++ runtime code ends up aborting because of that.

So something is definitely wrong with the release binaries we are generating here. Maybe upgrading/updating the release build compiler would fix things? Could be a compiler bug perhaps. If we can't figure this out, one option is to just use wasm-opt builds from the wasm waterfall builders. I verified those work fine like local builds.

@alexcrichton
Copy link
Contributor Author

Hm yeah poking around I wasn't able to see how to easily update the C compiler, but I did test out switching to clang to build the release binary and that looks to work (or at least I wasn't able to reproduce the segfault). How's switching to clang sound?

(not that I have any idea why switching to clang appears to fix it, it's probably something about libstdc++ implementations or something like that I guess?)

@kripken
Copy link
Member

kripken commented Jul 31, 2019

Clang sounds good!

alexcrichton added a commit to alexcrichton/binaryen that referenced this issue Jul 31, 2019
This fixes WebAssembly#2273 for... unknown reasons. The tl;dr; is that the current
release binaries built in this Alpine container seem to segfault when
run over some wasm files when an exception is thrown, but clang-built
binaries magically seems to not segfault!
alexcrichton added a commit to alexcrichton/binaryen that referenced this issue Aug 1, 2019
This fixes WebAssembly#2273 for... unknown reasons. The tl;dr; is that the current
release binaries built in this Alpine container seem to segfault when
run over some wasm files when an exception is thrown, but clang-built
binaries magically seems to not segfault!
kripken pushed a commit that referenced this issue Aug 1, 2019
This fixes #2273 for... unknown reasons. The tl;dr; is that the current
release binaries built in this Alpine container seem to segfault when
run over some wasm files when an exception is thrown, but clang-built
binaries magically seems to not segfault!
@newtack
Copy link

newtack commented Sep 15, 2019

I still have this issue. Has this been included in the 89 version? I'm using Ubuntu 18 (64 bit) and the download from https://github.com/WebAssembly/binaryen/releases/download/version_89/binaryen-version_89-x86_64-linux.tar.gz and encounter a core dump.

@kripken
Copy link
Member

kripken commented Sep 16, 2019

@newtack yes, looks like this was in the 89 release.

I wonder if there's some ABI mismatch or something causing this. Maybe we need to statically link in some system library like libc++?

Perhaps valgrind or gdb provide more info in a stack trace?

@vapier
Copy link

vapier commented Oct 13, 2019

the 89 release is already statically linked. they're also stripped of debug info, so not clear how to get a useful backtrace ... i don't see debug files attached to the release or in the tarball.

i can confirm that export BINARYEN_CORES=1 works around the issue for me. w/out that it crashes like 80% of the time for me, but haven't seen any crashes w/it enabled.

adding --debug doesn't seem to trigger the bug, but if it's a race, it's not surprising that generating MB of data to stderr perturbs the codepaths.

$ file $(which wasm-opt)
.../binaryen-version_89/wasm-opt:   ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped

$ gdb --args wasm-opt test.wasm.i64 -o test.wasm -O1
Reading symbols from .../binaryen-version_89wasm-opt...(no debugging symbols found)...done.
(gdb) r
Starting program: .../binaryen-version_89/wasm-opt test.wasm.i64 -o test.wasm -O1
[New LWP 2494]
[New LWP 2495]
[New LWP 2496]
[New LWP 2497]
[LWP 2494 exited]
[LWP 2497 exited]
[LWP 2496 exited]
[LWP 2495 exited]
[Inferior 1 (process 2490) exited normally]
(gdb) r
Starting program: .../binaryen-version_89/wasm-opt test.wasm.i64 -o test.wasm -O1
[New LWP 2499]
[New LWP 2500]
[New LWP 2501]
[New LWP 2502]
[LWP 2499 exited]
[LWP 2500 exited]
[LWP 2501 exited]
[LWP 2502 exited]
[Inferior 1 (process 2498) exited normally]
(gdb) r
Starting program: .../binaryen-version_89/wasm-opt test.wasm.i64 -o test.wasm -O1
[New LWP 2504]
[New LWP 2505]
[New LWP 2506]
[New LWP 2507]

Thread 5 "wasm-opt" received signal SIGABRT, Aborted.
[Switching to LWP 2507]
0x00000000007f9d51 in ?? ()
(gdb) bt
#0  0x00000000007f9d51 in ?? ()
#1  0x00000000007f9d96 in ?? ()
#2  0x0000000000000000 in ?? ()
(gdb) 

alexcrichton added a commit to alexcrichton/binaryen that referenced this issue Oct 25, 2019
This is a continued effort to try and track down WebAssembly#2273 which came up
again and is still present in the current release binaries. Issues like
crystal-lang/crystal#4276 may indicate that C++ exceptions are just
somewhat broken with static linking when using alpine, but I've at least
locally been able to verify that upgrading the container produces
working binaries which previously segfaulted on some wasm files.
alexcrichton added a commit to alexcrichton/binaryen that referenced this issue Oct 25, 2019
This is a continued effort to try and track down WebAssembly#2273 which came up
again and is still present in the current release binaries. Issues like
crystal-lang/crystal#4276 may indicate that C++ exceptions are just
somewhat broken with static linking when using alpine, but I've at least
locally been able to verify that upgrading the container produces
working binaries which previously segfaulted on some wasm files.
dschuff pushed a commit that referenced this issue Oct 25, 2019
This is a continued effort to try and track down #2273 which came up
again and is still present in the current release binaries. Issues like
crystal-lang/crystal#4276 may indicate that C++ exceptions are just
somewhat broken with static linking when using alpine, but I've at least
locally been able to verify that upgrading the container produces
working binaries which previously segfaulted on some wasm files.
@vapier
Copy link

vapier commented Mar 14, 2022

for posterity, these versions still crashed long after this issue was closed:

  • v97 v100 v101

these don't seem to crash:

  • v103 v104 v105

couldn't easily test v102 as binaries weren't made for the release.

@kripken
Copy link
Member

kripken commented Mar 14, 2022

Reading the above, it seems like we were never sure what causes this, but that switching to clang helped. So if someone can get a PR for our release pipeline here to build with clang that could fix it. I assume the changes would go here: .github/workflows/create_release.yml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants