Skip to content

Race condition in libuv console causes crash (assertion failure) on Windows #47715

Closed
@sivadeilra

Description

@sivadeilra

I represent a team using Node.js within Microsoft. When running Node on machines under heavy load, we have found that some Node processes fail, due to an assertion failing within /deps/uv/win/tty.c. This is the assertion that is failing (edited for brevity):

static void uv__tty_console_signal_resize(void) {
...
  uv_mutex_lock(&uv__tty_console_resize_mutex);
  assert(uv__tty_console_width != -1 && uv__tty_console_height != -1);      <-- this fails
  if (width != uv__tty_console_width || height != uv__tty_console_height) {
...
  }
}

I believe the root cause is that there is a race condition in uv_console_init():

void uv_console_init(void) {
  if (uv_sem_init(&uv_tty_output_lock, 1))
    abort();
  uv__tty_console_handle = CreateFileW(L"CONOUT$",
                                       GENERIC_READ | GENERIC_WRITE,
                                       FILE_SHARE_WRITE,
                                       0,
                                       OPEN_EXISTING,
                                       0,
                                       0);
  if (uv__tty_console_handle != INVALID_HANDLE_VALUE) {
    CONSOLE_SCREEN_BUFFER_INFO sb_info;
    QueueUserWorkItem(uv__tty_console_resize_message_loop_thread,       <-- this starts a task in a thread pool
                      NULL,
                      WT_EXECUTELONGFUNCTION);
    uv_mutex_init(&uv__tty_console_resize_mutex);
    if (GetConsoleScreenBufferInfo(uv__tty_console_handle, &sb_info)) {
      uv__tty_console_width = sb_info.dwSize.X;
      uv__tty_console_height = sb_info.srWindow.Bottom - sb_info.srWindow.Top + 1;
    }
  }
}

This code starts a task in a thread pool, and then queries the console size. If the thread pool task wakes up fast enough, then it will run the code that queries the console buffer size and attempts to resize it, before the first query of that console buffer succeeds, leading to the assertion failure.

Also, the worker thread can call uv_mutex_lock(&uv__tty_console_resize_mutex); before the mutex is even initialized, which would be another source of crashes.

We see this on build machines, where we spawn 140,000+ Node.js processes on machines with very large CPU counts (128 or more cores). It is more common when running VMs than when running on bare metal (where we have rarely seen this).

We are running Node.js v18.13.0. I have checked the sources, and this issue appears to be present in v18.13.0 and all later versions, up to and including main.

The fix should be to move the QueueUserWorkItem call after the if (GetConsoleScreenBuffer(...)) { ... } block. That should guarantee that the mutex is properly initialized, and that the first call to GetConsoleScreenBuffer has occurred, before the resizing thread can win the race.

Metadata

Metadata

Assignees

No one assigned

    Labels

    c++Issues and PRs that require attention from people who are familiar with C++.libuvIssues and PRs related to the libuv dependency or the uv binding.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions