Skip to content

hibernate leads to hanging process due to endless wait for write, setTimeout never called again #38108

Open
@julianhille

Description

@julianhille
  • Version: v12.18.3 / v14.16.0
  • Platform: seen on macOS, windows and linux in this case reports, reproduced on 5.9.16-050916-generic #202012211331 SMP Mon Dec 21 14:11:13 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Subsystem:

What steps will reproduce the bug?

  • Install bluebird (only used to delay, i guess could also be replaced with setTimeout and some Promise but is untested) npm install bluebird
  • Add this code snippet test.js:
const Promise = require('bluebird')

let processing = async () => {
  console.log('ProcessingEvent starts')
  for (let i = 0; i < 20; i++) {
    await Promise.delay(100)
    console.log('Waited', i)
    throw Error()
  }
}

let start = () => {
  let timeout = setTimeout(() => {
    processing()
      .then(() => {
        console.log('restarting')
        start()
      })
      .catch((error) => {
        console.error('Resurrecting from error', error)
        start()
      })
  }, 100)
}

start()
  • Start the script: node test.js
  • Get the scripts PID ps aux | grep node

Now it gets a bit tricky as the bug does not seem to be triggered always, but just in circumstances when writing / flushing etc to the fd of my TTY.

To trigger it:

  1. Stop the process kill -STOP $PID (it seems to be more probable when you look at the output and it seems to be right after an exception log)
  2. Wait 8-13 Minutes (could not find the correct time and or why this happens here only after a specific amount of time)
  3. Continue kill -CONT $PID the process and see that it hangs, if it does not hang, GOTO 1.

PS: its even easier to reproduce with faketime lib: faketime -f +10 x10000 node ./test.js
PPS: Detaching gdb from the process, makes the process responsive again (currently only tested with v14)
Also sending a SIGUSR1 makes it responsive again (only tested with v14)

How often does it reproduce? Is there a required condition?

I had a change of 1 in 3 (round about) to trigger it.
It is also triggered through hibernate / sleep.

What is the expected behavior?

The expected behavior is that it just keeps running after a CONT command is issued.

What do you see instead?

Hanging window, no process nor output on the console anymore.

Additional information

I attached a gdb when it hangs on linux and can see that it waits for libc write:
Here is the gdb stack trace:

#0  __libc_write (nbytes=66, buf=0x7ffd26332470, fd=27)
    at ../sysdeps/unix/sysv/linux/write.c:26
#1  __libc_write (fd=27, buf=0x7ffd26332470, nbytes=66)
    at ../sysdeps/unix/sysv/linux/write.c:24
#2  0x0000000001340fe2 in uv__writev (n=1, vec=0x7ffd26332340, fd=<optimized out>)
    at ../deps/uv/src/unix/stream.c:713
#3  uv__write (stream=stream@entry=0x4cd0a90) at ../deps/uv/src/unix/stream.c:891
#4  0x0000000001342d58 in uv_write2 (cb=0x1340bb0 <uv_try_write_cb>, send_handle=0x0, 
    nbufs=1, bufs=0x7ffd26332420, stream=0x4cd0a90, req=0x7ffd263322c0)
    at ../deps/uv/src/unix/stream.c:1472
#5  uv_write (cb=0x1340bb0 <uv_try_write_cb>, nbufs=1, bufs=0x7ffd26332420, 
    handle=0x4cd0a90, req=0x7ffd263322c0) at ../deps/uv/src/unix/stream.c:1497
#6  uv_try_write (stream=0x4cd0a90, bufs=0x7ffd26332420, nbufs=1)
    at ../deps/uv/src/unix/stream.c:1522
#7  0x0000000000adff69 in node::LibuvStreamWrap::DoTryWrite(uv_buf_t**, unsigned long*)
    ()
#8  0x0000000000adacc2 in int node::StreamBase::WriteString<(node::encoding)1>(v8::FunctionCallbackInfo<v8::Value> const&) ()
#9  0x0000000000adae9e in void node::StreamBase::JSMethod<&(int node::StreamBase::WriteString<(node::encoding)1>(v8::FunctionCallbackInfo<v8::Value> const&))>(v8::FunctionCallbackInfo<v8::Value> const&) ()
#10 0x00000000013555cd in Builtins_CallApiCallback ()
    at ../../deps/v8/../../deps/v8/src/builtins/base.tq:357
#11 0x00007ffd26336530 in ?? ()
#12 0x00007ffd26336568 in ?? ()
#13 0x0000000000000002 in ?? ()
#14 0x0000000000000048 in ?? ()
#15 0x00007ffd263364f0 in ?? ()
#16 0x0000000000000006 in ?? ()
#17 0x00007ffd263365a8 in ?? ()
#18 0x0000045eb42cf175 in ?? ()
#19 0x00002620f69446b9 in ?? ()
#20 0x0000000004b4f5a0 in ?? ()
#21 0x00001a8e0b3c04b1 in ?? ()
#22 0x00001a8e0b3c04b1 in ?? ()
#23 0x00000a45d09bf9c1 in ?? ()
#24 0x00001a8e0b3c04b1 in ?? ()
#25 0x000033d7ebf204b9 in ?? ()
#26 0x000033d7ebf20529 in ?? ()
#27 0x00002620f69446b9 in ?? ()
#28 0x000000000000001a in ?? ()
#29 0x00002620f69446b9 in ?? ()
#30 0x000033d7ebf20529 in ?? ()
#31 0x000010db5581c621 in ?? ()
#32 0x0000358ac70b6341 in ?? ()
#33 0x000010db5581c621 in ?? ()
#34 0x00007ffd26336638 in ?? ()
#35 0x00000000013549c4 in Builtins_InterpreterEntryTrampoline ()
    at ../../deps/v8/../../deps/v8/src/builtins/base.tq:357
#36 0x00002620f695aae1 in ?? ()
#37 0x00000a45d09a9dc1 in ?? ()
#38 0x000033d7ebf204b9 in ?? ()
#39 0x00002620f6948601 in ?? ()
#40 0x00001a8e0b3c04b1 in ?? ()
#41 0x00002620f695aae1 in ?? ()
#42 0x00000a45d09a9dc1 in ?? ()
#43 0x000033d7ebf204b9 in ?? ()
#44 0x00002620f6948601 in ?? ()
#45 0x0000358ac70b6341 in ?? ()
#46 0x0000358ac70b0f49 in ?? ()
#47 0x00001a8e0b3c04b1 in ?? ()
#48 0x000000e300000000 in ?? ()
#49 0x00002f7ef2e33eb9 in ?? ()
#50 0x000008d1cdbf4561 in ?? ()
#51 0x000033d7ebf204d9 in ?? ()
#52 0x00007ffd263366d0 in ?? ()
#53 0x0000045eb42cfe2a in ?? ()
#54 0x00002620f695aae1 in ?? ()
#55 0x00000a45d09a9dc1 in ?? ()
#56 0x000033d7ebf204b9 in ?? ()
#57 0x00001a8e0b3c06e9 in ?? ()
#58 0x00002620f6948601 in ?? ()
--Type <RET> for more, q to quit, c to continue without paging--
#59 0x0000000000000042 in ?? ()
#60 0x00002620f695aae1 in ?? ()
#61 0x0000000000000001 in ?? ()
#62 0x00003b3a081b5029 in ?? ()
#63 0x00000a45d09a9dc1 in ?? ()
#64 0x00002620f695a9a1 in ?? ()
#65 0x00003b3a081b5029 in ?? ()
#66 0x00002620f6948601 in ?? ()
#67 0x00003b3a081ac0f9 in ?? ()
#68 0x000033d7ebf204b9 in ?? ()
#69 0x00003b3a081ad091 in ?? ()
#70 0x00003b3a081ac0f9 in ?? ()
#71 0x00007ffd26336748 in ?? ()
#72 0x00000000013549c4 in Builtins_InterpreterEntryTrampoline ()
    at ../../deps/v8/../../deps/v8/src/builtins/base.tq:357
#73 0x000033d7ebf20499 in ?? ()
#74 0x00002f783a104871 in ?? ()
#75 0x000013d5948c2a41 in ?? ()
#76 0x000033d7ebf20499 in ?? ()
#77 0x00002f783a104871 in ?? ()
#78 0x00001a8e0b3c04b1 in ?? ()
#79 0x00003b3a081b4f99 in ?? ()
#80 0x000033d7ebf1fc89 in ?? ()
#81 0x000033d7ebf1fc89 in ?? ()
#82 0x0000005a00000000 in ?? ()
#83 0x00002f7ef2e29ee1 in ?? ()
#84 0x00003b3a081ace11 in ?? ()
#85 0x00003b3a081ac0f9 in ?? ()
#86 0x00007ffd26336780 in ?? ()
#87 0x000000000134e01c in Builtins_ArgumentsAdaptorTrampoline ()
#88 0x000013d5948c2a41 in ?? ()
#89 0x0000000000000000 in ?? ()

Looking at frame 6:

(gdb) p *stream
$5 = {data = 0x4cd09f0, loop = 0x2c9aac0 <default_loop_struct>, type = UV_TTY, 
  close_cb = 0x0, handle_queue = {0x4d9a498, 0x4c27ad8}, u = {fd = 52, reserved = {
      0x34, 0x3b0, 0x65006c00640001, 0x4cd0940}}, next_closing = 0x0, flags = 49160, 
  write_queue_size = 66, alloc_cb = 0x0, read_cb = 0x0, connect_req = 0x0, 
  shutdown_req = 0x0, io_watcher = {cb = 0x1341790 <uv__stream_io>, pending_queue = {
      0x4cd0b20, 0x4cd0b20}, watcher_queue = {0x4cd0b30, 0x4cd0b30}, pevents = 0, 
    events = 0, fd = 27}, write_queue = {0x7ffd26332318, 0x7ffd26332318}, 
  write_completed_queue = {0x4cd0b60, 0x4cd0b60}, connection_cb = 0x0, 
  delayed_error = 0, accepted_fd = -1, queued_fds = 0x0}
(gdb) p *bufs
$6 = {
  base = 0x7ffd26332470 "Waited \033[33m1\033[39m \033[33m0\033[39m \033[35m2021-05-31T23:00:54.215Z\033[39m\n3&\375\177", len = 66}

Edit:

  • Updated infos to make reproducing easier (faketime)
  • Updated version i tested (added v14.16.0)
  • Updated interesting fact about gdb / SIGUSR1 signal (dettaching)

Metadata

Metadata

Assignees

No one assigned

    Labels

    libuvIssues and PRs related to the libuv dependency or the uv binding.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions