Description
UPDATE: I believe I have found the cause. See #363 (comment) for more details.
I can reliably reproduce deadlocks within Engine.IO when serving multiple concurrent requests on a single session over a UNIX socket. Here's how to run:
% git clone https://github.com/posita/txsocketio.git .../txsocketio
% (cd .../txsocketio && git co 548e989022b28dc302280940431c000f44b1f58f )
% ( cd .../txsocketio/integrations/node && npm install )
...
% # SHELL # 1 - run local Engine.IO instance serving over UNIX domain socket
% ( cd .../txsocketio/integrations/node && DEBUG=\* npm start )
...
% # SHELL # 2 - run tests to connect to local Engine.IO instance
% ( cd .../txsocketio/integrations/node && TEST_DEADLOCK=1 tox -e py27-twisted_15_4-ssl_yes tests.integration_deadlock )
...
You may have to run the tests in SHELL # 2 several times before you observe the deadlock. You'll know you've hit it when the test doesn't resolve within a few seconds.
This is what the failure looks like:
% # SHELL # 1
% ( cd .../txsocketio/integrations/node && DEBUG=\* npm start )
...
[20:06:40.446] [LOG] engine intercepting request for path "/engine.io/" +46s
[20:06:40.446] [LOG] engine handling "GET" http request "/engine.io/?transport=polling" +0ms
[20:06:40.447] [LOG] engine handshaking client "2hS896Vuh-koQ3HHAAAE" +1ms
[20:06:40.448] [LOG] engine:socket sending packet "open" ({"sid":"2hS896Vuh-koQ3HHAAAE","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}) +1ms
[20:06:40.448] [LOG] engine:polling setting request +0ms
[20:06:40.449] [LOG] engine:socket flushing buffer to transport +1ms
[20:06:40.449] [LOG] engine:polling writing " �0{"sid":"2hS896Vuh-koQ3HHAAAE","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}" +0ms
[20:06:40.451] [LOG] engine:socket executing batch send callback +2ms
[20:06:40.452] [LOG] engine:socket sending packet "message" (Hello!) +1ms
[20:06:40.491] [LOG] engine intercepting request for path "/engine.io/" +39ms
[20:06:40.491] [LOG] engine handling "POST" http request "/engine.io/?transport=polling&sid=2hS896Vuh-koQ3HHAAAE" +0ms
[20:06:40.491] [LOG] engine setting new request for existing client +0ms
[20:06:40.492] [LOG] engine:polling received "�2probe" +1ms
[20:06:40.492] [LOG] engine:socket packet +0ms
[20:06:40.492] [LOG] engine:socket got ping +0ms
[20:06:40.492] [LOG] engine:socket sending packet "pong" (undefined) +0ms
[20:06:40.493] [LOG] engine intercepting request for path "/engine.io/" +1ms
[20:06:40.493] [LOG] engine handling "POST" http request "/engine.io/?transport=polling&sid=2hS896Vuh-koQ3HHAAAE" +0ms
[20:06:40.493] [LOG] engine setting new request for existing client +0ms
[20:06:40.494] [LOG] engine intercepting request for path "/engine.io/" +1ms
[20:06:40.494] [LOG] engine handling "POST" http request "/engine.io/?transport=polling&sid=2hS896Vuh-koQ3HHAAAE" +0ms
[20:06:40.494] [LOG] engine setting new request for existing client +0ms
[20:06:40.495] [LOG] engine:socket transport error +0ms
[20:06:40.497] [LOG] engine:polling received "�2probe" +3ms
[20:06:40.497] [LOG] engine:socket packet received with closed socket +0ms
...
Note the hung Deferred
at index 0:
% # SHELL # 2
% ( cd .../txsocketio/integrations/node && TEST_DEADLOCK=1 tox -e py27-twisted_15_4-ssl_yes tests.integration_deadlock )
...
================================================================
>>> FAIL: at least one of the requests is hung after 100 seconds! <<<
post_deferreds[0]: <Deferred at 0x102988830 waiting on Deferred at 0x1029a4878>
post_deferreds[1]: <Deferred at 0x1029a4560 current result: <twisted.web._newclient.Response object at 0x10297fc90>>
post_deferreds[2]: <Deferred at 0x1029bccb0 current result: <twisted.web._newclient.Response object at 0x1028f4310>>
================================================================
...
Compare that to what happens when everything occasionally works as expected:
% # SHELL # 1
% ( cd .../txsocketio/integrations/node && DEBUG=\* npm start )
...
[19:59:32.080] [LOG] engine intercepting request for path "/engine.io/" +0ms
[19:59:32.084] [LOG] engine handling "GET" http request "/engine.io/?transport=polling" +6ms
[19:59:32.088] [LOG] engine handshaking client "G4Db-G6cmltuPbeYAAAA" +4ms
[19:59:32.090] [LOG] engine:socket sending packet "open" ({"sid":"G4Db-G6cmltuPbeYAAAA","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}) +2ms
[19:59:32.091] [LOG] engine:polling setting request +1ms
[19:59:32.092] [LOG] engine:socket flushing buffer to transport +1ms
[19:59:32.094] [LOG] engine:polling writing " �0{"sid":"G4Db-G6cmltuPbeYAAAA","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}" +2ms
[19:59:32.100] [LOG] engine:socket executing batch send callback +6ms
[19:59:32.101] [LOG] engine:socket sending packet "message" (Hello!) +1ms
[19:59:32.153] [LOG] engine intercepting request for path "/engine.io/" +52ms
[19:59:32.154] [LOG] engine handling "POST" http request "/engine.io/?transport=polling&sid=G4Db-G6cmltuPbeYAAAA" +1ms
[19:59:32.155] [LOG] engine setting new request for existing client +1ms
[19:59:32.156] [LOG] engine:polling received "�2probe" +1ms
[19:59:32.158] [LOG] engine:socket packet +2ms
[19:59:32.159] [LOG] engine:socket got ping +0ms
[19:59:32.159] [LOG] engine:socket sending packet "pong" (undefined) +1ms
[19:59:32.161] [LOG] engine intercepting request for path "/engine.io/" +2ms
[19:59:32.161] [LOG] engine handling "POST" http request "/engine.io/?transport=polling&sid=G4Db-G6cmltuPbeYAAAA" +0ms
[19:59:32.162] [LOG] engine setting new request for existing client +1ms
[19:59:32.162] [LOG] engine:polling received "�2probe" +0ms
[19:59:32.163] [LOG] engine:socket packet +1ms
[19:59:32.163] [LOG] engine:socket got ping +0ms
[19:59:32.164] [LOG] engine:socket sending packet "pong" (undefined) +1ms
[19:59:32.164] [LOG] engine intercepting request for path "/engine.io/" +0ms
[19:59:32.164] [LOG] engine handling "POST" http request "/engine.io/?transport=polling&sid=G4Db-G6cmltuPbeYAAAA" +0ms
[19:59:32.165] [LOG] engine setting new request for existing client +1ms
[19:59:32.165] [LOG] engine:polling received "�2probe" +0ms
[19:59:32.166] [LOG] engine:socket packet +1ms
[19:59:32.166] [LOG] engine:socket got ping +0ms
[19:59:32.166] [LOG] engine:socket sending packet "pong" (undefined) +0ms
% # SHELL # 2
% ( cd .../txsocketio/integrations/node && TEST_DEADLOCK=1 tox -e py27-twisted_15_4-ssl_yes tests.integration_deadlock )
...
================================================================
All requests resolved
post_deferreds[0]: <Deferred at 0x10b000830 current result: <twisted.web._newclient.Response object at 0x10b0168d0>>
post_deferreds[1]: <Deferred at 0x10b01a560 current result: <twisted.web._newclient.Response object at 0x10b004450>>
post_deferreds[2]: <Deferred at 0x10b033cb0 current result: <twisted.web._newclient.Response object at 0x10b016ad0>>
================================================================
...
I noticed this while working on my own client code (which is why the test uses Twisted python), but the deadlock appears to be on the server side. Note that in the failing case, the server has log entries for all three POST
s, but does not respond (and does not send the corresponding pong
packets) for all three. I'm guessing one could (easily?) rewrite the tests using UPDATE: Without substantial tinkering, I'm not sure how one would implement this test using the engine.io-client
and likely achieve the same result (but see below).engine.io-client
, since it appears to be implemented solely in terms of IP connections rather than basic sockets. (Unless I'm missing something?)
The server instance used in the test is stupid simple. It is essentially:
var http = require('http');
var path = require('path');
var engine_io = require('engine.io');
require('console-stamp')(console, { pattern: 'HH:MM:ss.l' });
var app = http.createServer();
process.on('SIGINT', function () {
app.close();
process.exit();
});
var eio = engine_io(app);
var sock_path = path.join(__dirname, 'http.sock');
app.listen(sock_path);
(I've actually reduced the code to the exactly the above, and am still able to reproduce the issue.)
Also, this isn't dependent on sending ping
packets packets. If one changes the packet data sent from the client from b'\x00\x06\xff2probe'
to (e.g.) b'\x00\x06\xff4abcde'
, the deadlock will still (often) occur.
In the interest of some isolation, I attempted to reproduce the issue using curl
from the command line, but am as-of-yet unsuccessful in triggering the deadlock using something like the following:
(
echo -n $'\x00\x06\xff2probe' >|data.bin
for (( i=0; i < 10; ++i )) ; do
(
sid="$( python -c 'import sys, urllib ; print(urllib.urlencode({ "sid": sys.argv[1].strip() }))' "$( curl --max-time 1 --verbose --unix-socket ./integrations/node/http.sock 'http:/engine.io/?transport=polling' 2>&1 | grep '^< Set-Cookie: io=' | sed '-es%^.*: io=%%' )" )"
f() { curl --data-binary \@data.bin --header 'Content-Type: application/octet-stream' --max-time 10 --request POST --unix-socket ./integrations/node/http.sock "http:/engine.io/?transport=polling&${sid}" || echo '>>>>>>>> TIMED OUT! <<<<<<<<' }
f & f & f & f & f &
) &
sleep 1
done
sleep 10
rm -f data.bin
)
I suspect this is because the average interval between UPDATE: this isn't quite true. Once the loop gets going, I've observed split times on par with my Twisted version, but on average they are still higher. One difference might be that all connections (while separate) are coming from the same process in the Twisted version, whereas they are all different POST
s is substantially larger when coming from my command line script (typically > 10 ms) versus those from my Twisted python test (typically around 1-3 ms).curl
processes in the shell version? Another difference might be the time between the GET
method to acquire the session ID and the first POST
? UPDATE to the UPDATE: If I create a delay of 10 ms between requests from within my Twisted version, I don't get any deadlocks on my machine.