Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault on Linux at exit #7

Open
tharvik opened this issue Feb 7, 2024 · 16 comments
Open

segfault on Linux at exit #7

tharvik opened this issue Feb 7, 2024 · 16 comments
Labels
bug Something isn't working

Comments

@tharvik
Copy link

tharvik commented Feb 7, 2024

I'm unable to get it to work on Linux amd64, it segfaults when it exits.

reproducable code

import wrtc from "@roamhq/wrtc";

const conn = new wrtc.RTCPeerConnection();
conn.close();

that you can test with docker
docker run -v $PWD:/src node:18 /bin/sh -c 'cd /src && npm ci && node ./index.js'

here goes the gdb trace from inside docker

#0  0x0000000000d70503 in v8::HandleScope::HandleScope(v8::Isolate*) ()
#1  0x0000000000b59ee3 in node::ThreadPoolWork::ScheduleWork()::{lambda(uv_work_s*, int)#2}::_FUN(uv_work_s*, int) ()
#2  0x000000000166cb2d in uv__work_done (handle=0x532e650 <default_loop_struct+176>)
    at ../deps/uv/src/threadpool.c:318
#3  0x0000000001671316 in uv__async_io (loop=0x532e5a0 <default_loop_struct>, w=<optimized out>,
    events=<optimized out>) at ../deps/uv/src/unix/async.c:163
#4  0x0000000001683854 in uv__io_poll (loop=loop@entry=0x532e5a0 <default_loop_struct>,
    timeout=<optimized out>) at ../deps/uv/src/unix/epoll.c:374
#5  0x0000000001671c7e in uv_run (loop=0x532e5a0 <default_loop_struct>, mode=UV_RUN_ONCE)
    at ../deps/uv/src/unix/core.c:406
#6  0x0000000000b190e0 in node::Environment::CleanupHandles() ()
#7  0x0000000000b191ac in node::Environment::RunCleanup() ()
#8  0x0000000000ad4f4a in node::FreeEnvironment(node::Environment*) ()
#9  0x0000000000bdb7ad in node::NodeMainInstance::Run() ()
#10 0x0000000000b4dab8 in node::LoadSnapshotDataAndRun(node::SnapshotData const**, node::InitializationResult const*) ()
#11 0x0000000000b5161f in node::Start(int, char**) ()

it is probably the same issue as in node-webrtc#636; I was unable to build on linux (some missing linked symbols at loadtime) so it'll be hard for me to contribute but tell me if you need more info.

@duvallj
Copy link
Collaborator

duvallj commented Feb 7, 2024

Yep, I can reproduce on MacOS amd64 as well. It looks like node-webrtc#636 (comment) contains a reasonable solution, thanks for linking the issue.

@duvallj
Copy link
Collaborator

duvallj commented Mar 16, 2024

Update on this (very late sorry): removing the AsyncContextReleaser and just using delete behind a mutex does solve this specific case, but now the release builds seem to hang when running the rest of the test cases. Continuing work on this (slowly, when I have time)

@duvallj duvallj added the bug Something isn't working label Mar 25, 2024
@g7i
Copy link

g7i commented Aug 20, 2024

@duvallj I'm getting the following error on M2. Is it because of the same issue?
I don't get this issue on exit but while the connection is active.


#
# Fatal error in , line 0
# Fatal JavaScript invalid size error 169220804 (see crbug.com/1201626)
#
#
#
#FailureMessage Object: 0x16bc2dcc8
 1: 0x1042f76b4 node::NodePlatform::GetStackTracePrinter()::$_3::__invoke() [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
 2: 0x10536b05c V8_Fatal(char const*, ...) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
 3: 0x1045b8bac v8::internal::FactoryBase<v8::internal::Factory>::NewFixedArrayWithFiller(v8::internal::Handle<v8::internal::Map>, int, v8::internal::Handle<v8::internal::Oddball>, v8::internal::AllocationType) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
 4: 0x10475c370 v8::internal::(anonymous namespace)::ElementsAccessorBase<v8::internal::(anonymous namespace)::FastPackedObjectElementsAccessor, v8::internal::(anonymous namespace)::ElementsKindTraits<(v8::internal::ElementsKind)2>>::GrowCapacity(v8::internal::Handle<v8::internal::JSObject>, unsigned int) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
 5: 0x10499fe80 v8::internal::Runtime_GrowArrayElements(int, unsigned long*, v8::internal::Isolate*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
 6: 0x104d0cc44 Builtins_CEntry_Return1_ArgvOnStack_NoBuiltinExit [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
 7: 0x10a15b234
 8: 0x109fd5fa0
 9: 0x10a14a650
10: 0x10a14c190
11: 0x104c8250c Builtins_JSEntryTrampoline [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
12: 0x104c821f4 Builtins_JSEntry [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
13: 0x104557fa4 v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
14: 0x1045573f0 v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
15: 0x104431cc8 v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
16: 0x1041c8d3c node::InternalCallbackScope::Close() [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
17: 0x1041c901c node::InternalMakeCallback(node::Environment*, v8::Local<v8::Object>, v8::Local<v8::Object>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*, node::async_context) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
18: 0x1041df46c node::AsyncWrap::MakeCallback(v8::Local<v8::Function>, int, v8::Local<v8::Value>*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
19: 0x10435d254 node::StreamBase::CallJSOnreadMethod(long, v8::Local<v8::ArrayBuffer>, unsigned long, node::StreamBase::StreamBaseJSChecks) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
20: 0x10435e8c0 node::EmitToJSStreamListener::OnStreamRead(long, uv_buf_t const&) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
21: 0x1043cf2a8 node::crypto::TLSWrap::ClearOut() [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
22: 0x1043d0fe4 node::crypto::TLSWrap::OnStreamRead(long, uv_buf_t const&) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
23: 0x104362ba0 node::LibuvStreamWrap::OnUvRead(long, uv_buf_t const*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
24: 0x104363324 node::LibuvStreamWrap::ReadStart()::$_1::__invoke(uv_stream_s*, long, uv_buf_t const*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
25: 0x104c6bf48 uv__stream_io [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
26: 0x104c7384c uv__io_poll [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
27: 0x104c61d38 uv_run [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
28: 0x1041c9754 node::SpinEventLoopInternal(node::Environment*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
29: 0x1042d4984 node::NodeMainInstance::Run(node::ExitCode*, node::Environment*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
30: 0x1042d472c node::NodeMainInstance::Run() [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
31: 0x1042604b0 node::LoadSnapshotDataAndRun(node::SnapshotData const**, node::InitializationResultImpl const*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
32: 0x1042607d0 node::Start(int, char**) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
33: 0x18dae60e0 start [/usr/lib/dyld]
zsh: trace trap  node .

@mikey0000
Copy link

@duvallj I'm getting the following error on M2. Is it because of the same issue? I don't get this issue on exit but while the connection is active.


#
# Fatal error in , line 0
# Fatal JavaScript invalid size error 169220804 (see crbug.com/1201626)
#
#
#
#FailureMessage Object: 0x16bc2dcc8
 1: 0x1042f76b4 node::NodePlatform::GetStackTracePrinter()::$_3::__invoke() [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
 2: 0x10536b05c V8_Fatal(char const*, ...) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
 3: 0x1045b8bac v8::internal::FactoryBase<v8::internal::Factory>::NewFixedArrayWithFiller(v8::internal::Handle<v8::internal::Map>, int, v8::internal::Handle<v8::internal::Oddball>, v8::internal::AllocationType) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
 4: 0x10475c370 v8::internal::(anonymous namespace)::ElementsAccessorBase<v8::internal::(anonymous namespace)::FastPackedObjectElementsAccessor, v8::internal::(anonymous namespace)::ElementsKindTraits<(v8::internal::ElementsKind)2>>::GrowCapacity(v8::internal::Handle<v8::internal::JSObject>, unsigned int) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
 5: 0x10499fe80 v8::internal::Runtime_GrowArrayElements(int, unsigned long*, v8::internal::Isolate*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
 6: 0x104d0cc44 Builtins_CEntry_Return1_ArgvOnStack_NoBuiltinExit [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
 7: 0x10a15b234
 8: 0x109fd5fa0
 9: 0x10a14a650
10: 0x10a14c190
11: 0x104c8250c Builtins_JSEntryTrampoline [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
12: 0x104c821f4 Builtins_JSEntry [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
13: 0x104557fa4 v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
14: 0x1045573f0 v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
15: 0x104431cc8 v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
16: 0x1041c8d3c node::InternalCallbackScope::Close() [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
17: 0x1041c901c node::InternalMakeCallback(node::Environment*, v8::Local<v8::Object>, v8::Local<v8::Object>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*, node::async_context) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
18: 0x1041df46c node::AsyncWrap::MakeCallback(v8::Local<v8::Function>, int, v8::Local<v8::Value>*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
19: 0x10435d254 node::StreamBase::CallJSOnreadMethod(long, v8::Local<v8::ArrayBuffer>, unsigned long, node::StreamBase::StreamBaseJSChecks) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
20: 0x10435e8c0 node::EmitToJSStreamListener::OnStreamRead(long, uv_buf_t const&) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
21: 0x1043cf2a8 node::crypto::TLSWrap::ClearOut() [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
22: 0x1043d0fe4 node::crypto::TLSWrap::OnStreamRead(long, uv_buf_t const&) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
23: 0x104362ba0 node::LibuvStreamWrap::OnUvRead(long, uv_buf_t const*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
24: 0x104363324 node::LibuvStreamWrap::ReadStart()::$_1::__invoke(uv_stream_s*, long, uv_buf_t const*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
25: 0x104c6bf48 uv__stream_io [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
26: 0x104c7384c uv__io_poll [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
27: 0x104c61d38 uv_run [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
28: 0x1041c9754 node::SpinEventLoopInternal(node::Environment*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
29: 0x1042d4984 node::NodeMainInstance::Run(node::ExitCode*, node::Environment*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
30: 0x1042d472c node::NodeMainInstance::Run() [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
31: 0x1042604b0 node::LoadSnapshotDataAndRun(node::SnapshotData const**, node::InitializationResultImpl const*) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
32: 0x1042607d0 node::Start(int, char**) [/Users/gourav/.nvm/versions/node/v20.5.0/bin/node]
33: 0x18dae60e0 start [/usr/lib/dyld]
zsh: trace trap  node .

it seems to be the same issue. Also happens on x64 mac

@mikey0000
Copy link

Is there anything I can do to help push this along?

@duvallj
Copy link
Collaborator

duvallj commented Sep 3, 2024

@mikey0000 a testcase to reproduce the issue you're seeing would be nice. My current progress is at https://github.com/WonderInventions/node-webrtc/tree/remove-async-context-releaser, testing whatever testcase you come up with on that branch would also be helpful.

@mikey0000
Copy link

I'm going to be spending some time on it this coming week.

@mikey0000
Copy link

FYI I don't run into any hanging on that branch with tests. Still I'm going to look at writing the test to cause the crash.

@mikey0000
Copy link

Something further that is interesting I moved webrtc to an electron utility process and it stopped crashing (early days as I'm also calling child.kill). I think there is something in the threading that is causing problems for electron, I moved things to a POC for trying this out. I'm not encountering the segfault with node v20 only with electron. Only additional thing I found was in peer_connection_factory.cc

it uses the base threading library which states it should be called with Quit not stop

  _workerThread->Quit();
  _signalingThread->Quit();

if I've read the notes in thread.h properly

// Never call Stop on the current thread. Instead use the inherited Quit
// function which will exit the base Thread without terminating the
// underlying OS thread.

still it makes no difference to the crash I see in electron.

@duvallj
Copy link
Collaborator

duvallj commented Sep 9, 2024

FYI I don't run into any hanging on that branch with tests.

Neither do I on Mac, think it might be a Linux/Windows thing though.

it uses the base threading library which states it should be called with Quit not Stop

oh good find! I'll make that change

still it makes no difference to the crash I see in electron.

Looked into this more and I fear this just may be some other heisenbug in V8 itself. See nodejs/node#47928 for a very simple reproduction:

node -e "const x = []; for(i = 0; i < 112813859; i++){ x[i] = false };"

crashes with pretty much the exact same stacktrace on my system.

@mikey0000

@mikey0000
Copy link

mikey0000 commented Sep 9, 2024

FWIW I'm running Linux, so yeah could be a windows thing.

So for now can state using utility process which forks it off into a pure nodejs process works as a workaround.
EDIT: having troubles using utilityProcess.. still so no resolution yet.

@zacharygriffee
Copy link

I'm running into segfault as well. Tests succeed but when trying to complete the process I get:

Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)

I am running on Fedora linux and it produces the same error on ci github actions (node 22).

https://github.com/zacharygriffee/rtc-link

@duvallj
Copy link
Collaborator

duvallj commented Nov 6, 2024

Hi @zacharygriffee! Do you mind sharing the segfault stack trace you're getting? Wondering if it's like the one at the top of the issue (which I may be able to do something about) or like the ones further down (which I am probably not able to do something about)

@zacharygriffee
Copy link

zacharygriffee commented Nov 6, 2024

Here is the dumped core related to my personal computer. I'm trying to figure out how to get the core dump from the github actions.

`
Process 17718 (node) of user 1000 dumped core.

                                            Module /home/zevilz/WebstormProjects/webrtc-protoplex/node_modules/@roamhq/wrtc-linux-x64/wrtc.node without build-id.
                                            Module libcap.so.2 from rpm libcap-2.69-8.fc40.x86_64
                                            Module libnss_resolve.so.2 from rpm systemd-255.12-1.fc40.x86_64
                                            Module libnss_mdns4_minimal.so.2 from rpm nss-mdns-0.15.1-11.fc40.x86_64
                                            Stack trace of thread 17718:
                                            #0  0x0000000000ef1e83 n/a (/home/zevilz/.nvm/versions/node/v20.16.0/bin/node + 0xaf1e83)
                                            #1  0x0000000000c7b4c5 n/a (/home/zevilz/.nvm/versions/node/v20.16.0/bin/node + 0x87b4c5)
                                            #2  0x00000000018ae73d n/a (/home/zevilz/.nvm/versions/node/v20.16.0/bin/node + 0x14ae73d)
                                            #3  0x00000000018b2093 n/a (/home/zevilz/.nvm/versions/node/v20.16.0/bin/node + 0x14b2093)
                                            #4  0x00000000018c6b0b n/a (/home/zevilz/.nvm/versions/node/v20.16.0/bin/node + 0x14c6b0b)
                                            #5  0x00000000018b2db7 n/a (/home/zevilz/.nvm/versions/node/v20.16.0/bin/node + 0x14b2db7)
                                            #6  0x0000000000c2bfc0 n/a (/home/zevilz/.nvm/versions/node/v20.16.0/bin/node + 0x82bfc0)
                                            #7  0x0000000000c2c07c n/a (/home/zevilz/.nvm/versions/node/v20.16.0/bin/node + 0x82c07c)
                                            #8  0x0000000000bcd8c1 n/a (/home/zevilz/.nvm/versions/node/v20.16.0/bin/node + 0x7cd8c1)
                                            #9  0x0000000000d0e060 n/a (/home/zevilz/.nvm/versions/node/v20.16.0/bin/node + 0x90e060)
                                            #10 0x0000000000c7216f n/a (/home/zevilz/.nvm/versions/node/v20.16.0/bin/node + 0x87216f)
                                            #11 0x00007ff0aa355088 __libc_start_call_main (libc.so.6 + 0x2a088)
                                            #12 0x00007ff0aa35514b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a14b)
                                            #13 0x0000000000bc6b6e n/a (/home/zevilz/.nvm/versions/node/v20.16.0/bin/node + 0x7c6b6e)
                                            ELF object binary architecture: AMD x86-64

`

Let me know if there is anything else you need... if I figure out how to get the dump file for github, I'll post that as well.

@duvallj
Copy link
Collaborator

duvallj commented Nov 6, 2024

@zacharygriffee oh wow that's completely different... the stack trace seems to be entirely w/i the node binary itself, tho I won't deny it's likely this library is still causing it somehow. A dump file from github or a dump w/ symbols would be useful.

@zacharygriffee
Copy link

I spent a good several hours trying to figure out how to get a core dump from github or a dump with symbols from my computer and the npm package segfault-handler won't even install on my system. So I'll post when I get anything else useful I can from the situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants