Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

test: Incorrect end event callback invocation on socket #16805

Closed
gireeshpunathil opened this issue Apr 20, 2015 · 17 comments
Closed

test: Incorrect end event callback invocation on socket #16805

gireeshpunathil opened this issue Apr 20, 2015 · 17 comments

Comments

@gireeshpunathil
Copy link
Member

On OSX we were seeing failures in the Node test

test-child-process-fork-getconnections.js

This is only reproducible under heavy stress in
the network - more read/writes over the ports.

throw new Error('[c] closing by accident!');
^
Error: [c] closing by accident!
at Socket. (test/simple/test-child-process-fork-getconnections.js:39:17)
at Socket.emit (events.js:129:20)
at _stream_readable.js:908:16
at process._tickCallback (node.js:355:11)

test/simple/test-child-process-fork-getconnections.js:57
throw new Error('child died unexpectedly!');
^
Error: child died unexpectedly!
at ChildProcess. (test/simple/test-child-process-fork-getconnections.js:57:13)
at ChildProcess.emit (events.js:110:17)
at Process.ChildProcess._handle.onexit (child_process.js:1074:12)

12 clients connect to a server, the sockets passed to a forked child,
and each of the sockets are closed, through a message passing between
the parent and child. child registers a call back on 'end' event of the
socket which should have never called because the sockets are destroyed
directly, but due to unknown reason, this end call back is fired causing
this issue.

Whatever be the race condition is, it is causing the socket to be closed
before they are destroyed, and thereby the end callback is triggered.

@misterdjules
Copy link

How did you generate "heavy stress in the network - more read/writes over the ports"? If you have some code that reproduces that, it would allow more people to jump in and help fixing this.

Also, does it happen with node v0.10, or just with node v0.12?

Thank you!

@gireeshpunathil
Copy link
Member Author

Here are the steps:

  1. http.js - a server-client ping-pong code, where server echoes the client message after 100 milliseconds. If the 'stress' proves to be insufficient, this timeout can be reduced.
var net = require('net');
var fs = require('fs');
var src = fs.readFileSync('input', 'utf8');
var result = '';
var server = net.createServer();

server.on('connection', function(socket) {
socket.on('data', function() {
  setTimeout(function(){socket.write(src);}, 100);
});
});

server.listen(process.argv[2], '127.0.0.1');
server.on('listening', function() {
client = net.connect(process.argv[2], '127.0.0.1');
client.write(src); 
client.on('data', function(d) {
// console.log('XX ' + d.toString());
client.write(d);
});
});

2.an input file whose content is being ping-ponged. I used a file with 100KB of junk characters. Not attaching here due it its size as well as being trivial to produce.

3.run.sh - the invocation script, which invokes this code as many times as required, in parallel. First argument indicates the instance count, and the second one, a base port number to start with. I used 40 as count and 20000 as starting port. If the 'stress' proves to be insufficient, the count value can be increased.

x=$1
start=$2
while [ $x -ge 0 ]
do
port=`expr $start + $x`
./node http.js $port &
echo "started node with server listening on $port"
x=`expr $x - 1`
done

I am able to reproduce this both in 0.10 and 0.12 lines.

Hope this helps.

@misterdjules
Copy link

Thank you @gireeshpunathil for the detailed information 👍 I was actually able to reproduce the test failing without doing anything else than running it several times sequentially.

@saquibkhan
Copy link

How to debug .cc files?
I want to debug process_wrap.cc and also process.c files

@misterdjules
Copy link

@saquibkhan You can use a debugger like gdb, lldb or Visual C++'s debugger.

@saquibkhan
Copy link

Hi,
Please find my analysis below:

On running node with --abort-on-uncaught-exception option I got Exception: Illegal Instruction 4

On searching for this problem, which is specific to OS X i found the issue is because of mmacosx-version-min=10.5. I tried changing it to mmacosx-version-min=10.6 and mmacosx-version-min=10.7 but that didn’t help

Am i missing something here?

References:
http://stackoverflow.com/questions/10177038/illegal-instruction-4-shows-up-in-os-x-lion
http://stackoverflow.com/questions/14268887/what-is-the-illegal-instruction-4-error-and-why-does-mmacosx-version-min-10

@misterdjules
Copy link

@saquibkhan Are you referring to the same problem? I don't think the link you mentioned are related to this specific issue, but I may be missing something.

@saquibkhan
Copy link

Yes I am referring to the same problem.
If you use node --abort-on-uncaught-exception ./test/simple/test-child-process-fork-getconnections.js command to run the test, this outputs Illegal Instruction 4

See below:

Uncaught Error: child died unexpectedly!

Uncaught Error: [c] closing by accident! id: 6388

FROM
Socket. (/SAQUIB/gitHub/node/test/simple/test-child-process-fork-getconnections.js:57:17)
Socket.emit (events.js:131:20)
_stream_readable.js:911:16
process._tickCallback (node.js:358:11)

Uncaught Error: child died unexpectedly!

FROM
ChildProcess. (/SAQUIB/gitHub/node/test/simple/test-child-process-fork-getconnections.js:96:13)
ChildProcess.emit (events.js:112:17)
Process.ChildProcess._handle.onexit (child_process.js:1110:12)
Illegal instruction: 4

@saquibkhan
Copy link

The Illegal Instruction message usually means that there may be some binary which has invalid CPU instruction.

This could be due to various reasons, few are below:

  1. compilation generated broken binary for your platform
  2. system has bad binary from different architecture which it is trying to run
  3. node has a bug (for example, integer overflow) which causes memory corruption and leads to Illegal Instruction

What is your opinion? are you able to see Illegal Instruction 4 ?

@misterdjules
Copy link

@saquibkhan I can't reproduce the illegal instruction. I still believe it's a separate problem: this issue is about timing issues/race conditions in the code responsible for passing handles between processes, whereas the illegal instruction problem seems to be related to the --abort-on-uncaught-exception command line switch.

Could you please see if you can reproduce this with a simple program that only throws an error, like node --abort-on-uncaught-exception -e 'throw new Error'? If so, please file a separate issue.

Thank you!

@saquibkhan
Copy link

If you use below shell script you should see Illegal Instruction 4:

x=100
while [ $x -ge 0 ]
do
node --abort-on-uncaught-exception ./test/simple/test-child-process-fork-getconnections.js
x=expr $x - 1
done

@misterdjules
Copy link

@saquibkhan I still can't reproduce the illegal instruction. node dumps core, but I don't get the Illegal instruction error.

@saquibkhan
Copy link

I was curious because i get Illegal Instruction 4 intermittently with the crash mentioned in this issue.

NOTE: This is happening on OS X Yosemite 10.10.2
Also I don't have gcore available

my logs on running node --abort-on-uncaught-exception ./test/simple/test-child-process-fork-getconnections.js 15 times

⮀ node ⮀ v0.12 + ⮀ sh run.sh 15
ok
ok
ok
Uncaught Error: [c] closing by accident!

FROM
Socket. (/SAQUIB/gitHub/forked-node/node/test/simple/test-child-process-fork-getconnections.js:39:17)
Socket.emit (events.js:129:20)
_stream_readable.js:908:16
process._tickCallback (node.js:355:11)
Uncaught Error: child died unexpectedly!

FROM
ChildProcess. (/SAQUIB/gitHub/forked-node/node/test/simple/test-child-process-fork-getconnections.js:57:13)
ChildProcess.emit (events.js:110:17)
Process.ChildProcess._handle.onexit (child_process.js:1074:12)
run.sh: line 6: 12945 Illegal instruction: 4 node --abort-on-uncaught-exception ./test/simple/test-child-process-fork-getconnections.js
ok
ok
ok
ok
ok
ok
ok
ok
Uncaught Error: [c] closing by accident!

FROM
Socket. (/SAQUIB/gitHub/forked-node/node/test/simple/test-child-process-fork-getconnections.js:39:17)
Socket.emit (events.js:129:20)
_stream_readable.js:908:16
process._tickCallback (node.js:355:11)
Uncaught Error: child died unexpectedly!

FROM
ChildProcess. (/SAQUIB/gitHub/forked-node/node/test/simple/test-child-process-fork-getconnections.js:57:13)
ChildProcess.emit (events.js:110:17)
Process.ChildProcess._handle.onexit (child_process.js:1074:12)
run.sh: line 6: 12973 Illegal instruction: 4 node --abort-on-uncaught-exception ./test/simple/test-child-process-fork-getconnections.js
ok
ok
ok

@saquibkhan
Copy link

After debugging more, my analysis so far:
I noted that intermittently unexpected EOF is received on socket (read == 0)

File: stream.c
Function: uv_read

else if (nread == 0) {
      uv__stream_eof(stream, &buf);
      return;
}

@saquibkhan
Copy link

Hi,
Please find my further analysis below, after debugging, tracing the code and using multiple Dtrace scripts I thought of tracing the issue in history from where it is originated.

I found that there were 2 other scripts which were giving same issue on running multiple times
1.test/simple/test-child-process-fork-net.js
2.test/simple/test-child-process-fork-net2.js

I went back till changes done on _May 14 2012 commit: dceebbf_ and where above two scripts were added with relevant changes

I could see the same issue when you run either of the above two test scripts multiple times.

log1:

timers.js:103
            if (!process.listeners('uncaughtException').length) throw e;
                                                                      ^
Error: IPC channel is already disconnected
    at ChildProcess.disconnect (child_process.js:367:26)
    at Object._onTimeout (/SAQUIB/gitHub/forked-node/node/test/simple/test-child-process-fork-net2.js:120:14)
    at Timer.ontimeout (timers.js:101:19)

log2:

timers.js:103
            if (!process.listeners('uncaughtException').length) throw e;
                                                                      ^
Error: channel closed
    at ChildProcess.send (child_process.js:307:26)
    at Object._onTimeout (/SAQUIB/gitHub/forked-node/node/test/simple/test-child-process-fork-net2.js:118:14)
    at Timer.ontimeout (timers.js:101:19)

Note: I also traced back when test/simple/test-child-process-fork-getconnections.js was added _Jan 18 2013 commit: db5ee0b_ and i could see the same issue back then when running the script multiple times.

@saquibkhan
Copy link

Hi,
I changed the test script a little and again ran the script in loop (around 50 times). At one point node hanged. (tested on v0.10)

in the above script I changed sockets[m.id].destroy(); to sockets[m.id].end();

Test Script

Please find the backtrace below:

thread backtrace all
* thread #1: tid = 0x1baaec, 0x00007fff81a7021a libsystem_kernel.dylib`kevent + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff81a7021a libsystem_kernel.dylib`kevent + 10
    frame #1: 0x000000010038afaf node`uv__io_poll + 629
    frame #2: 0x000000010037ed91 node`uv_run + 239
    frame #3: 0x0000000100339676 node`node::Start(int, char**) + 365
    frame #4: 0x0000000100002bb4 node`start + 52

  thread #2: tid = 0x1baaed, 0x00007fff81a6a4de libsystem_kernel.dylib`mach_msg_trap + 10
    frame #0: 0x00007fff81a6a4de libsystem_kernel.dylib`mach_msg_trap + 10
    frame #1: 0x00007fff81a6964f libsystem_kernel.dylib`mach_msg + 55
    frame #2: 0x00007fff84653b34 CoreFoundation`__CFRunLoopServiceMachPort + 212
    frame #3: 0x00007fff84652ffb CoreFoundation`__CFRunLoopRun + 1371
    frame #4: 0x00007fff84652858 CoreFoundation`CFRunLoopRunSpecific + 296
    frame #5: 0x00007fff84708ef1 CoreFoundation`CFRunLoopRun + 97
    frame #6: 0x0000000100389e13 node`uv__cf_loop_runner + 67
    frame #7: 0x000000010037e452 node`uv__thread_start + 25
    frame #8: 0x00007fff8718b268 libsystem_pthread.dylib`_pthread_body + 131
    frame #9: 0x00007fff8718b1e5 libsystem_pthread.dylib`_pthread_start + 176
    frame #10: 0x00007fff8718941d libsystem_pthread.dylib`thread_start + 13

  thread #3: tid = 0x1baaee, 0x00007fff81a70232 libsystem_kernel.dylib`kevent64 + 10, queue = 'com.apple.libdispatch-manager'
    frame #0: 0x00007fff81a70232 libsystem_kernel.dylib`kevent64 + 10
    frame #1: 0x00007fff817afd91 libdispatch.dylib`_dispatch_mgr_invoke + 247
    frame #2: 0x00007fff817afa6a libdispatch.dylib`_dispatch_mgr_thread + 52

  thread #4: tid = 0x1baaef, 0x00007fff81a6a51a libsystem_kernel.dylib`semaphore_wait_trap + 10, name = 'SamplerThread'
    frame #0: 0x00007fff81a6a51a libsystem_kernel.dylib`semaphore_wait_trap + 10
    frame #1: 0x000000010032f989 node`v8::internal::MacOSSemaphore::Wait() + 17
    frame #2: 0x000000010026b0c4 node`v8::internal::RuntimeProfiler::WaitForSomeIsolateToEnterJS() + 84
    frame #3: 0x000000010026b1e2 node`v8::internal::RuntimeProfilerRateLimiter::SuspendIfNecessary() + 22
    frame #4: 0x000000010032fa42 node`v8::internal::SamplerThread::Run() + 78
    frame #5: 0x000000010032f41e node`v8::internal::ThreadEntry(void*) + 60
    frame #6: 0x00007fff8718b268 libsystem_pthread.dylib`_pthread_body + 131
    frame #7: 0x00007fff8718b1e5 libsystem_pthread.dylib`_pthread_start + 176
    frame #8: 0x00007fff8718941d libsystem_pthread.dylib`thread_start + 13
(lldb) 

@carlosaviles
Copy link

Hi,

I am seeing this problem on my development machine (iMac from 2008 running OS 10.10.3).
I have replaced execSync by exec and I can't see the problem anymore. Is this a proper workaround?

Best,
Carlos.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants