Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault: 11 on OSX #4091

Closed
medikoo opened this issue Dec 1, 2015 · 21 comments
Closed

Segmentation fault: 11 on OSX #4091

medikoo opened this issue Dec 1, 2015 · 21 comments
Labels
macos Issues and PRs related to the macOS platform / OSX.

Comments

@medikoo
Copy link

medikoo commented Dec 1, 2015

In one of the projects I have test which involves many watchers, and some race condition occurs.
Occasionally (let's say once per three times) test crashes with Segmentation fault: 11

$ node_modules/.bin/tad index.js is-ignored.js 
09:35:53.505  ✓  index.js .......................................
09:35:53.635  ✓  is-ignored.js ................Segmentation fault: 11

There's purely JS modules involved, and it's observable on OSX 10.11.1 and Node.js v5.1.0.

I was not able to produce any small test case for that. Best way to reproduce is to do:

$ git clone git@github.com:medikoo/fs2.git
$ cd fs2
$ npm install
$ node_modules/.bin/tad index.js is-ignored.js

(After test fails, to rerun it it's good to proceed with rm -rf test && git checkout test as broken test leaves temporary files).

@ChALkeR ChALkeR added the macos Issues and PRs related to the macOS platform / OSX. label Dec 1, 2015
@evanlucas
Copy link
Contributor

I was able to reproduce.

Looks to be an issue with fsevents?

* thread #11: tid = 0x927ca1, 0x0000000100789633 node`uv__cf_loop_cb + 598, stop reason = EXC_BAD_ACCESS (code=1, address=0x74696731ff)
  * frame #0: 0x0000000100789633 node`uv__cf_loop_cb + 598
    frame #1: 0x00007fff903798b1 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
    frame #2: 0x00007fff903590ac CoreFoundation`__CFRunLoopDoSources0 + 556
    frame #3: 0x00007fff903585cf CoreFoundation`__CFRunLoopRun + 927
    frame #4: 0x00007fff90357fc8 CoreFoundation`CFRunLoopRunSpecific + 296
    frame #5: 0x00007fff90399961 CoreFoundation`CFRunLoopRun + 97
    frame #6: 0x0000000100789846 node`uv__cf_loop_runner + 70
    frame #7: 0x00007fff9b56e9b1 libsystem_pthread.dylib`_pthread_body + 131
    frame #8: 0x00007fff9b56e92e libsystem_pthread.dylib`_pthread_start + 168
    frame #9: 0x00007fff9b56c385 libsystem_pthread.dylib`thread_start + 13

@yjhjstz
Copy link

yjhjstz commented Dec 1, 2015

It's an issue with fsevents. How to reproduce? @evanlucas

@bnoordhuis
Copy link
Member

/cc @indutny - he wrote most of libuv's fsevents code.

@indutny
Copy link
Member

indutny commented Dec 1, 2015

I will take a look at this, but I need a test case. @evanlucas could you please share it with me?

@medikoo
Copy link
Author

medikoo commented Dec 1, 2015

@indutny you need OSX, and then just follow steps indicated in description. It's important to note that it doesn't happen on every run (let's say 1 per 3 runs), so if it's not exposed by first test run, try again, you'll definitely get it.
I was trying to prepare something smaller, but it doesn't seem easy to do, it's clear that it happens only in some race condition scenarios, and there are a lot file changes going on in test.

@indutny
Copy link
Member

indutny commented Dec 1, 2015

The problem happens in uv__fsevents_reschedule when it tries to do state = handle->loop->cf_state;. The loop is NULL, so it crashes.

@indutny
Copy link
Member

indutny commented Dec 1, 2015

Will look more into it.

@indutny
Copy link
Member

indutny commented Dec 2, 2015

Strictly speaking handle is a total garbage:

(lldb) p *handle
warning: could not load any Objective-C class information. This will significantly reduce the quality of type information available.
(uv_fs_event_t) $0 = {
  data = 0x0000000103104e18
  loop = 0x0000000000000000
  type = UV_UNKNOWN_HANDLE
  close_cb = 0x0000000000000000
  handle_queue = ([0] = 0x0000000000000000, [1] = 0x0000000000000000)
  u = {
    fd = 271648070
    reserved = ([0] = 0x8000000010310546, [1] = 0x0000000d000003e9, [2] = 0x0000000103104a40, [3] = 0x0000000200000003)
  }
  next_closing = 0x0000000103100c10
  flags = 15
  path = 0x00007fff5fbf5dd0 ""
  cb = 0x0000000101804a00
  event_watcher = {
    cb = 0x000000002cc989bf
    pending_queue = ([0] = 0x0000000000000000, [1] = 0x0000000103802e70)
    watcher_queue = ([0] = 0x0000000000000000, [1] = 0x0000000000000000)
    pevents = 0
    events = 0
    fd = 0
    rcount = 0
    wcount = 0
  }
  realpath = 0x0000000000000000
  realpath_len = 0
  cf_flags = 0
  cf_cb = 0x0000000000000000
  cf_events = ([0] = 0x0000000000000000, [1] = 0x0000000000000000)
  cf_member = ([0] = 0x00000001038031c8, [1] = 0x00000001038020e8)
  cf_error = 58732608
  cf_mutex = (__sig = 0, __opaque = "")

@indutny
Copy link
Member

indutny commented Dec 2, 2015

@medikoo may I ask you to give a try to the following patch?

diff --git a/deps/uv/src/unix/fsevents.c b/deps/uv/src/unix/fsevents.c
index 8143f7c..1259754 100644
--- a/deps/uv/src/unix/fsevents.c
+++ b/deps/uv/src/unix/fsevents.c
@@ -76,6 +76,7 @@ typedef struct uv__cf_loop_state_s uv__cf_loop_state_t;
 struct uv__cf_loop_signal_s {
   QUEUE member;
   uv_fs_event_t* handle;
+  unsigned int closing:1;
 };

 struct uv__fsevents_event_s {
@@ -98,7 +99,9 @@ struct uv__cf_loop_state_s {
 /* Forward declarations */
 static void uv__cf_loop_cb(void* arg);
 static void* uv__cf_loop_runner(void* arg);
-static int uv__cf_loop_signal(uv_loop_t* loop, uv_fs_event_t* handle);
+static int uv__cf_loop_signal(uv_loop_t* loop,
+                              uv_fs_event_t* handle,
+                              unsigned int closing);

 /* Lazy-loaded by uv__fsevents_global_init(). */
 static CFArrayRef (*pCFArrayCreate)(CFAllocatorRef,
@@ -387,7 +390,8 @@ static void uv__fsevents_destroy_stream(uv_loop_t* loop) {


 /* Runs in CF thread, when there're new fsevent handles to add to stream */
-static void uv__fsevents_reschedule(uv_fs_event_t* handle) {
+static void uv__fsevents_reschedule(uv_fs_event_t* handle,
+                                    unsigned int closing) {
   uv__cf_loop_state_t* state;
   QUEUE* q;
   uv_fs_event_t* curr;
@@ -486,7 +490,7 @@ final:
    *
    * NOTE: This is coupled with `uv_sem_wait()` in `uv__fsevents_close`
    */
-  if (!uv__is_active(handle))
+  if (closing)
     uv_sem_post(&state->fsevent_sem);
 }

@@ -676,7 +680,7 @@ void uv__fsevents_loop_delete(uv_loop_t* loop) {
   if (loop->cf_state == NULL)
     return;

-  if (uv__cf_loop_signal(loop, NULL) != 0)
+  if (uv__cf_loop_signal(loop, NULL, 0) != 0)
     abort();

   uv_thread_join(&loop->cf_thread);
@@ -753,7 +757,7 @@ static void uv__cf_loop_cb(void* arg) {
     if (s->handle == NULL)
       pCFRunLoopStop(state->loop);
     else
-      uv__fsevents_reschedule(s->handle);
+      uv__fsevents_reschedule(s->handle, s->closing);

     QUEUE_REMOVE(item);
     uv__free(s);
@@ -762,7 +766,9 @@ static void uv__cf_loop_cb(void* arg) {


 /* Runs in UV loop to notify CF thread */
-int uv__cf_loop_signal(uv_loop_t* loop, uv_fs_event_t* handle) {
+int uv__cf_loop_signal(uv_loop_t* loop,
+                       uv_fs_event_t* handle,
+                       unsigned int closing) {
   uv__cf_loop_signal_t* item;
   uv__cf_loop_state_t* state;

@@ -771,6 +777,7 @@ int uv__cf_loop_signal(uv_loop_t* loop, uv_fs_event_t* handle) {
     return -ENOMEM;

   item->handle = handle;
+  item->closing = closing;

   uv_mutex_lock(&loop->cf_mutex);
   QUEUE_INSERT_TAIL(&loop->cf_signals, &item->member);
@@ -833,7 +840,7 @@ int uv__fsevents_init(uv_fs_event_t* handle) {

   /* Reschedule FSEventStream */
   assert(handle != NULL);
-  err = uv__cf_loop_signal(handle->loop, handle);
+  err = uv__cf_loop_signal(handle->loop, handle, 0);
   if (err)
     goto fail_loop_signal;

@@ -873,7 +880,7 @@ int uv__fsevents_close(uv_fs_event_t* handle) {

   /* Reschedule FSEventStream */
   assert(handle != NULL);
-  err = uv__cf_loop_signal(handle->loop, handle);
+  err = uv__cf_loop_signal(handle->loop, handle, 1);
   if (err)
     return -err;

@indutny
Copy link
Member

indutny commented Dec 2, 2015

Appears to be working fine for me so far...

@indutny
Copy link
Member

indutny commented Dec 2, 2015

Got to say, that if it will fix this - you have hit a pretty unexpected edge case! :) Good job.

@medikoo
Copy link
Author

medikoo commented Dec 2, 2015

@indutny works great, I gave it a dozen of runs with even more tests involved, and no crash was observed. Great thanks!

Will this fix also be ported to LTS 4.2?

@indutny
Copy link
Member

indutny commented Dec 2, 2015

@medikoo definitely. Though, I will need to create a test case for libuv first ;)

indutny added a commit to indutny/libuv that referenced this issue Dec 2, 2015
When `uv_fsevents_t` handle is closed immediately after initializing,
there is a possibility that the `CFRunLoop`'s thread will process both
of these events at the same time. `uv__is_active(handle)` will return
`0`, and the `uv_close()` semaphore will be unblocked, leading
to the use after free in node.js.

See: nodejs/node#4091
@indutny
Copy link
Member

indutny commented Dec 2, 2015

Looks like the test is already in libuv, filed a PR: libuv/libuv#637

@medikoo
Copy link
Author

medikoo commented Dec 2, 2015

👍

indutny added a commit to libuv/libuv that referenced this issue Dec 3, 2015
When `uv_fsevents_t` handle is closed immediately after initializing,
there is a possibility that the `CFRunLoop`'s thread will process both
of these events at the same time. `uv__is_active(handle)` will return
`0`, and the `uv_close()` semaphore will be unblocked, leading
to the use after free in node.js.

See: nodejs/node#4091
PR-URL: #637
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
@bo01ean
Copy link

bo01ean commented Dec 16, 2015

I think I have this one too. I am wrapping my node process with pm2 so it will just respawn.

Is there instructions to recompile somewhere so I can verify?

@indutny
Copy link
Member

indutny commented Dec 16, 2015

@bo01ean definitely! The fix has been already landed in master, so if you could just:

  1. Clone this repo
  2. Run ./configure && make -j9 in the node folder
  3. Test your code with /path/to/node/folder/out/Release/node

@indutny
Copy link
Member

indutny commented Dec 24, 2015

Have we updated libuv already? cc @evanlucas do we need to backport this to LTS?

@evanlucas
Copy link
Contributor

I'm not sure that LTS has had it back ported. Will check tomorrow and see.

@evanlucas
Copy link
Contributor

It looks like Argon is still using libuv@1.7.5 (4c59407) so I would think it would need backporting.

@evanlucas
Copy link
Contributor

This should have been fixed in v4.2.5 and v5.3.0. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
macos Issues and PRs related to the macOS platform / OSX.
Projects
None yet
Development

No branches or pull requests

8 participants