Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault SIGSEGV on Windows 10 using node 18.14.2 #46798

Closed
skgith1 opened this issue Feb 23, 2023 · 10 comments
Closed

Segmentation Fault SIGSEGV on Windows 10 using node 18.14.2 #46798

skgith1 opened this issue Feb 23, 2023 · 10 comments
Labels
windows Issues and PRs related to the Windows platform.

Comments

@skgith1
Copy link

skgith1 commented Feb 23, 2023

Version

18.14.2

Platform

Microsoft Windows NT 10.0.17763.0 x64

Subsystem

No response

What steps will reproduce the bug?

Running node.js causes random crashing once or twice per day. No specific trigger is identified.

How often does it reproduce? Is there a required condition?

Cannot be reproduced seems to happen at random, happens once or twice every 24 hours.

What is the expected behavior?

Node.js should continue to run without issue.

What do you see instead?

A segfault error occurs on windows, the last two seg faults provided below.

**
PID 7764 received SIGSEGV for address: 0x954adc7d
SymInit: Symbol-SearchPath: '.;C:\Users\Administrator\nodejs\application;C:\Program Files\nodejs;C:\Windows;C:\Windows\system32;SRVC:\websymbolshttp://msdl.microsoft.com/download/symbols;', symOptions: 530, UserName: 'Administrator'
OS-Version: 10.0.17763 () 0x110-0x3
[...]\node_modules\segfault-handler\src\StackWalker.cpp (941): StackWalker::ShowCallstack
[...]\node_modules\segfault-handler\src\segfault-handler.cpp (242): segfault_handler
00007FFCFD0FC936 (ntdll): (filename not available): RtlInitializeCriticalSectionAndSpinCount
00007FFCFD094949 (ntdll): (filename not available): RtlWalkFrameChain
00007FFCFD13459E (ntdll): (filename not available): KiUserExceptionDispatcher
00007FF6954ADC7D (node): (filename not available): SSL_select_next_proto
00007FF69504CC0A (node): (filename not available): v8::internal::OrderedHashTablev8::internal::OrderedHashSet,1::NumberOfDeletedElementsOffset
00007FF6954722C7 (node): (filename not available): ASN1_BIT_STRING_set_bit
00007FF69548D247 (node): (filename not available): SSL_extension_supported
00007FF695471F41 (node): (filename not available): ASN1_BIT_STRING_set_bit
00007FF695472513 (node): (filename not available): ASN1_BIT_STRING_set_bit
00007FF695480F42 (node): (filename not available): v8::internal::Scope::end_position
00007FF6954814F4 (node): (filename not available): v8::internal::Scope::end_position
00007FF69549257C (node): (filename not available): SSL_set_default_read_buffer_len
00007FF6954BB3B2 (node): (filename not available): SSL_group_to_name
00007FF6954BB331 (node): (filename not available): SSL_group_to_name
00007FF6954B0F33 (node): (filename not available): SSL_write_ex
00007FF6954AD9AE (node): (filename not available): SSL_read
00007FF695046B22 (node): (filename not available): v8::internal::OrderedHashTablev8::internal::OrderedHashSet,1::NumberOfDeletedElementsOffset
00007FF6950472F0 (node): (filename not available): v8::internal::OrderedHashTablev8::internal::OrderedHashSet,1::NumberOfDeletedElementsOffset
00007FF69504B64E (node): (filename not available): v8::internal::OrderedHashTablev8::internal::OrderedHashSet,1::NumberOfDeletedElementsOffset
00007FF6950D08D2 (node): (filename not available): v8::internal::MicrotaskQueue::microtasks_policy
00007FF6950CF812 (node): (filename not available): node::SetTracingController
00007FF695294CFB (node): (filename not available): uv_tty_set_vterm_state
00007FF6952AB671 (node): (filename not available): uv_run
00007FF69527DF05 (node): (filename not available): node::SpinEventLoop
00007FF69518CE98 (node): (filename not available): X509_STORE_CTX_get_lookup_certs
00007FF695211AD1 (node): (filename not available): node::InitializeOncePerProcess
00007FF6952132D5 (node): (filename not available): node::Start
00007FF695017DEC (node): (filename not available): CRYPTO_memcmp
00007FF696267178 (node): (filename not available): inflateValidate
00007FFCFAE17AD4 (KERNEL32): (filename not available): BaseThreadInitThunk
00007FFCFD0EA371 (ntdll): (filename not available): RtlUserThreadStart
{"level":"error","message":"Forever detected script exited with code: 3221225477"}
**

**
PID 3612 received SIGSEGV for address: 0x9b6c74d
SymInit: Symbol-SearchPath: '.;C:\Users\Administrator\nodejs\application;C:\Program Files\nodejs;C:\Windows;C:\Windows\system32;SRVC:\websymbolshttp://msdl.microsoft.com/download/symbols;', symOptions: 530, UserName: 'Administrator'
OS-Version: 10.0.17763 () 0x110-0x3
[...]\node_modules\segfault-handler\src\StackWalker.cpp (941): StackWalker::ShowCallstack
[...]\node_modules\segfault-handler\src\segfault-handler.cpp (242): segfault_handler
00007FFCFD0FC936 (ntdll): (filename not available): RtlInitializeCriticalSectionAndSpinCount
00007FFCFD094949 (ntdll): (filename not available): RtlWalkFrameChain
00007FFCFD13459E (ntdll): (filename not available): KiUserExceptionDispatcher
00007FF709B6C74D (node): (filename not available): SSL_select_next_proto
00007FF70970CBFA (node): (filename not available): v8::internal::OrderedHashTablev8::internal::OrderedNameDictionary,3::NextTableOffset
00007FF709B30DE7 (node): (filename not available): ASN1_BIT_STRING_set_bit
00007FF709B4BD27 (node): (filename not available): SSL_extension_supported
00007FF709B30A61 (node): (filename not available): ASN1_BIT_STRING_set_bit
00007FF709B31033 (node): (filename not available): ASN1_BIT_STRING_set_bit
00007FF709B3FA22 (node): (filename not available): v8::internal::Scope::end_position
00007FF709B3FFD4 (node): (filename not available): v8::internal::Scope::end_position
00007FF709B5104C (node): (filename not available): SSL_set_default_read_buffer_len
00007FF709B79E82 (node): (filename not available): SSL_group_to_name
00007FF709B79E01 (node): (filename not available): SSL_group_to_name
00007FF709B6FA03 (node): (filename not available): SSL_write_ex
00007FF709B6C47E (node): (filename not available): SSL_read
00007FF709706B12 (node): (filename not available): v8::internal::OrderedHashTablev8::internal::OrderedNameDictionary,3::NextTableOffset
00007FF7097072E0 (node): (filename not available): v8::internal::OrderedHashTablev8::internal::OrderedNameDictionary,3::NextTableOffset
00007FF70970B63E (node): (filename not available): v8::internal::OrderedHashTablev8::internal::OrderedNameDictionary,3::NextTableOffset
00007FF7097908C2 (node): (filename not available): v8::internal::Debug::break_frame_id
00007FF70978F802 (node): (filename not available): node::SetTracingController
00007FF709954BAB (node): (filename not available): uv_tty_set_vterm_state
00007FF70996B521 (node): (filename not available): uv_run
00007FF70993DDB5 (node): (filename not available): node::SpinEventLoop
00007FF70984CE28 (node): (filename not available): X509_STORE_CTX_get_lookup_certs
00007FF7098D1A61 (node): (filename not available): node::InitializeOncePerProcess
00007FF7098D3265 (node): (filename not available): node::Start
00007FF7096D7DEC (node): (filename not available): CRYPTO_memcmp
00007FF70A925C78 (node): (filename not available): inflateValidate
00007FFCFAE17AD4 (KERNEL32): (filename not available): BaseThreadInitThunk
00007FFCFD0EA371 (ntdll): (filename not available): RtlUserThreadStart
{"level":"error","message":"Forever detected script exited with code: 3221225477"}
**

Additional information

This only started recently but nothing in our code has really changed we have been using the same node_modules and keeping the script running using the "forever" package for years now and never have encounter an error like this. Really stumped here.

@bnoordhuis
Copy link
Member

The stack trace is baloney (common problem on Windows unfortunately) so that doesn't tell us much. What it does tell me is that you're using at least one native module (segfault-handler) and maybe more.

Native modules are frequent sources of crashes so try excluding those. Any file inside your node_modules directory with a .node suffix is a native module.

I'm going to close this for now but I can reopen it when you can show that the issue is with node and not some third-party component. We'll either need some way to reproduce the crash or a (non-baloney) stack trace. You may get better results from a C++ debugger. Good luck!

@bnoordhuis bnoordhuis closed this as not planned Won't fix, can't repro, duplicate, stale Feb 24, 2023
@bnoordhuis bnoordhuis added the windows Issues and PRs related to the Windows platform. label Feb 24, 2023
@skgith1
Copy link
Author

skgith1 commented Feb 24, 2023

@bnoordhuis Thank you so much for the reply, this is helpful. A question;

The only thing I see besides segfault-handler, (this was occurring before we added segfault-handler to help diagnose) is bcrypt (bcrypt_lib.node).

During my debugging I noticed this was recently added to our codebase around the time the crashes started to occur (and the stack trace has mentions of "CRYPTO"). However it is not being actively used, meaning we have const bcrypt = require('bcrypt'); at the start of our app but bcrypt isn't called anywhere in the code at all. Could this native module cause issues despite not being "in use"? Also the app does not crash right away it just happens at random it will run fine the whole day before dying.

Thanks again for the input, this occurs on a production system so it is a challenging time.

@bnoordhuis
Copy link
Member

A native module can pretty much do anything so yes, the mere act of loading it is sufficient.

@skgith1
Copy link
Author

skgith1 commented Feb 27, 2023

@bnoordhuis I had some time to test everything and find this issue is occurring without any native modules! I cleaned up, removed, and rebuilt everything and found the issue still occurs with v18.14.2 on Windows but does not occur on v18.12.1

I will now try to test with 18.13.0 to see where this apparent bug was introduced, I'm not sure if you want to reopen this issue or wait until someone else encounters the same problem but from my limited testing is seems like there might be an issue somewhere.

@skgith1
Copy link
Author

skgith1 commented Feb 27, 2023

I can confirm the same segfault also occurs on v18.13.0 I will continue running v18.12.1 for an extended period of time to double check my hypothesis.

@bnoordhuis
Copy link
Member

I can reopen the issue if you want but we'll need a (preferably easy!) way to reproduce it.

Alternatively, narrowing it down to a specific commit with git bisect might be enough to identify the cause (but means building node from source repeatedly.)

A third option is running node under a debugger and seeing if you can get out a meaningful stack trace that way.

@skgith1
Copy link
Author

skgith1 commented Mar 9, 2023

@bnoordhuis Confirming this bug still occurs in v18.15.0

Could we reopen this issue? A critical bug like this really should be investigated, especially since it occurs during normal operation on a basic install of nodejs without any native node modules used.

I am not quite sure how to reproduce or connect a debugger to get a better stacktrace but maybe someone else in the nodejs community can help and find the source of the problem. Right now we are stuck using v18.12.1 for the foreseeable future which is not ideal.

@bnoordhuis
Copy link
Member

@skgith1 without a reproducer or more info, there's not much point. I sympathize with your plight but "happens on my machine" bugs aren't actionable, no one is going to investigate them.

@gugu
Copy link

gugu commented Mar 21, 2023

/srv/shorturl_redirector/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x3236)[0x7f86dc0d0236]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x7f86dedcf140]
node(SSL_select_next_proto+0x4c)[0x17f0bfc]
node[0xd243e0]
node(tls_handle_alpn+0x53)[0x1834f83]
node(tls_parse_all_extensions+0x143)[0x18176f3]
node(tls_post_process_client_hello+0x70)[0x1835200]
node[0x1822b74]
node(ssl3_read_bytes+0x320)[0x1811050]
node(ssl3_read+0x60)[0x17e0c40]
node(SSL_read+0x87)[0x17ee3f7]
node(_ZN4node6crypto7TLSWrap8ClearOutEv+0x77)[0xd2c8f7]
node(_ZN4node6crypto7TLSWrap12OnStreamReadElRK8uv_buf_t+0xf8)[0xd2d5f8]
node(_ZN4node15LibuvStreamWrap8OnUvReadElPK8uv_buf_t+0x89)[0xc6f959]
node[0xc6fd68]
node[0x1676f67]
node[0x1677790]
node[0x167d534]
node(uv_run+0x14e)[0x166b95e]
node(_ZN4node13SpinEventLoopEPNS_11EnvironmentE+0x14d)[0xabda2d]
node(_ZN4node16NodeMainInstance3RunEv+0xf4)[0xbc1874]
node(_ZN4node22LoadSnapshotDataAndRunEPPKNS_12SnapshotDataEPKNS_20InitializationResultE+0xb4)[0xb36434]
node(_ZN4node5StartEiPPc+0x2df)[0xb3a02f]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7f86dec0ad0a]
node(_start+0x2e)[0xaba37e]
Segmentation fault (core dumped)

Happens in prod, don't know how to reproduce. Dockerfile

FROM public.ecr.aws/docker/library/node:18.15.0-slim AS builder
...

I'll try to get more details

@bnoordhuis
Copy link
Member

bnoordhuis commented Mar 22, 2023

@gugu Thanks. It's at least a legible stack trace, that's a good start.

Since it says "core dumped", is it an option for you to open the core file in gdb and get a detailed stack trace out? The first 8 stack frames in particular are what I'm interested in.

edit: I see you opened #47207, let's continue there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
windows Issues and PRs related to the Windows platform.
Projects
None yet
Development

No branches or pull requests

3 participants