Description
A daemon is a program that is designed to run forever so every single error that is not fatal should be handled and the show must go on. Currently ZMQ has 404 errno_assert
calls - 404 ways to make a daemon crash with SIGABRT. Please consider this function from tcp.cpp:
void zmq::tune_tcp_socket (fd_t s_)
{
// Disable Nagle's algorithm. We are doing data batching on 0MQ level,
// so using Nagle wouldn't improve throughput in anyway, but it would
// hurt latency.
int nodelay = 1;
int rc = setsockopt (s_, IPPROTO_TCP, TCP_NODELAY, (char*) &nodelay,
sizeof (int));
#ifdef ZMQ_HAVE_WINDOWS
wsa_assert (rc != SOCKET_ERROR);
#else
errno_assert (rc == 0);
#endif#ifdef ZMQ_HAVE_OPENVMS
// Disable delayed acknowledgements as they hurt latency significantly.
int nodelack = 1;
rc = setsockopt (s_, IPPROTO_TCP, TCP_NODELACK, (char*) &nodelack,
sizeof (int));
errno_assert (rc != SOCKET_ERROR);
#endif
}
When setsockopt() returns an error, your daemon would crash. And there is a trivial error-free scenario when this could happen - remote side can send TCP Reset packet that will immediately invalidate the socket but instead of reconnecting, ZMQ will crash whole app.
I was debugging my app that coredumped at this particular function:
Thread 1 (Thread 802007c00 (LWP 101563/firsthop-receiver)):
#0 0x0000000801896dcc in thr_kill () from /lib/libc.so.7
#1 0x000000080193d72b in abort () from /lib/libc.so.7
#2 0x0000000000415ac1 in zmq::zmq_abort (errmsg_=Could not find the frame base for "zmq::zmq_abort(char const*)".
) at src/err.cpp:84
#3 0x0000000000453a6e in zmq::tune_tcp_socket (s_=17) at src/tcp.cpp:60
#4 0x0000000000454524 in zmq::tcp_connecter_t::out_event (this=0x80285a600) at src/tcp_connecter.cpp:134
#5 0x0000000000416be6 in zmq::kqueue_t::loop (this=0x802051300) at src/kqueue.cpp:205
#6 0x0000000000416ce5 in zmq::kqueue_t::worker_routine (arg_=0x802051300) at src/kqueue.cpp:222
#7 0x0000000000434bd8 in thread_routine (arg_=0x802051380) at src/thread.cpp:96
#8 0x0000000801618e14 in pthread_getprio () from /lib/libthr.so.3
#9 0x0000000000000000 in ?? ()
(gdb) thread 1
[Switching to thread 1 (Thread 802007c00 (LWP 101563/firsthop-receiver))]#3 0x0000000000453a6e in zmq::tune_tcp_socket (s_=17)
at src/tcp.cpp:60
60 errno_assert (rc == 0);
(gdb) p errstr
$4 = 0x801b7b240 "Connection reset by peer"
(gdb)
Sure I can rewrite this function to ignore failure non-disabled Naggle and delayed-ACKs, but 402 of errno_assert()
s will remain in code. Am I missing something?