Closed
Description
I found this issue while testing how our app handles a network interruption and noticed this behavior with using node sockets in Linux. I have verified this issue happens with both 0.10.41
and 4.2.3
.
Reproduce Setup
- Need to setup two docker containers. Container deps: update openssl to 1.0.1j #1 hosts the TCP echo service and container test: don't remove empty.txt on win32 #2 runs the socket write code.
- In the echo container I installed
ecco
- The socket write container will run the app but it needs to have
iptables
andnetstat
to run the test. - Run the test container test: don't remove empty.txt on win32 #2 on the host network and make it privilaged.
--net=host --privileged
- In our test we need to wait for an
ETIMEDOUT
which can take 15 minutes with the default setting. To make the test bearable runsysctl -w net.ipv4.tcp_retries2=8
in the test container to make this around 100s instead.
Dockerfile
FROM node:0.10.41
RUN apt-get update && apt-get install -y iptables net-tools
RUN npm install -g ecco
CMD ["bash"]
Socket write code
#!/usr/bin/env node
"use strict"
var net = require('net');
var MESSAGE_RATE = 1000;
var RANDOM_DATA = 700;
var host = '172.17.0.4';
var port = '9999';
var debug = require('debug')('SocketTest');
var socket = net.createConnection(port, host);
socket.addr = host + ':' + port;
socket.host = host;
socket.port = port;
socket.on('connect', function () {
debug(this.address())
var lastError = this.error;
this.error = null;
if (lastError) {
debug('reconnect');
} else {
startLoop()
}
});
socket.on('error', function (err) {
this.error = err;
debug('error', err);
});
socket.on('close', function (had_error) {
debug('close');
retry(this);
});
socket.on('end', function () {
debug('Socket End')
retry(this);
});
socket.buffer = new Buffer([]);
socket.on('data', function (data) {
debug('received data', data)
});
socket.on('drain', function(){
debug('socket drained');
});
socket.setKeepAlive(true, 20000);
function retry(s) {
if(s.retrying || s.closing) return;
s.retrying = true;
s.retryTimer = setTimeout(function () {
s.retrying = false;
s.connect(s.port, s.host);
}, 1000);
}
var messageNumber = 0
var lastTimer = null
function startLoop() {
if (!socket.error) {
var currentNumber = messageNumber
debug('Sending message #'+ messageNumber)
var ret = socket.write(new Array(RANDOM_DATA).join(Math.random().toString(36)) + ' Received message # ' + messageNumber++, sendCb)
debug('Socket write %s', ret ? 'flushed to kernel': 'queued in memory');
debug('Socket %d bytes written', socket.bytesWritten);
debug('Socket %s bytes read', socket.bytesRead);
debug('Socket bufferSize %d', socket.bufferSize);
} else {
debug('socket error not writing.')
}
lastTimer = setTimeout(startLoop, MESSAGE_RATE);
function sendCb(){
debug('Callback called for #%d', currentNumber);
currentNumber = null
}
}
Reproduce Steps
- Start the echo app on container deps: update openssl to 1.0.1j #1.
ecco -l -v -p 9999 --address 0.0.0.0
- Make note of the IP address for that container using
awk 'NR==1 {print $1}' /etc/hosts
- Update the IP in the test code (in my case it's
172.17.0.4
) and run the test app - Observe the data streaming in the echo container's terminal
- In container test: don't remove empty.txt on win32 #2 add the firewall rule to block the echo server's IP
iptables -A INPUT -s "172.17.0.4" -j DROP;iptables -A OUTPUT -s "172.17.0.4" -j DROP;
- Notice the data stops streaming in the echo terminal. This is expected behavior.
- Run
netstat -napc
in container test: don't remove empty.txt on win32 #2 and observe the socket'sSend-Q
increasing over time. - Wait until the
ETIMEDOUT
occurs and then remove the rule.iptables -D INPUT -s "172.17.0.4" -j DROP;iptables -D OUTPUT -s "172.17.0.4" -j DROP;
- Verify the rule is removed
iptables -nvL
- After a bit notice container deps: update openssl to 1.0.1j #1 says
client connected
after container test: don't remove empty.txt on win32 #2 reconnects. - The socket appears to write out data but no data appears to the echo terminal. This is unexpected.
- Apply the block rule again and do netstat again. We expect the
Send-Q
of the new connection to accumulate but it does not. This is also unexpected.
(don't forget to remove the rule after testing)
Observations
- This issue doesn't happen on a Mac. I don't have an windows enviorment so I have not tested there.
- This issue also does not happen when you send small amounts of data to socket.write. if you change the
RANDOM_DATA
to a lower number like 70 it will recover. - In
0.10.41
I noticedprocess._errno
contains anECANCELED
error and doesn't look like socket code errors out the socket when it receives this error string from the handle.