Skip to content

avoid asserting on getifaddrs failure #2051

Closed
@garlick

Description

@garlick

We've noticed the following assertion in zeromq 4.1.4 on a rhel 7.2 system (kernel 3.10):

Connection refused (src/tcp_address.cpp:172)

Here's a backtrace from a core file

gdb) where
#0  0x00002aaaac47b5f7 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00002aaaac47cce8 in __GI_abort () at abort.c:90
#2  0x00002aaaab8cc759 in zmq::zmq_abort(char const*) ()
   from /lib64/libzmq.so.5
#3  0x00002aaaab8fc3bd in zmq::tcp_address_t::resolve_nic_name(char const*, bool, bool) () from /lib64/libzmq.so.5
#4  0x00002aaaab8fc52e in zmq::tcp_address_t::resolve_interface(char const*, bool, bool) () from /lib64/libzmq.so.5
#5  0x00002aaaab8fcc45 in zmq::tcp_address_t::resolve(char const*, bool, bool, bool) () from /lib64/libzmq.so.5
#6  0x00002aaaab9000fe in zmq::tcp_listener_t::set_address(char const*) ()
   from /lib64/libzmq.so.5
#7  0x00002aaaab8f2740 in zmq::socket_base_t::bind(char const*) ()
   from /lib64/libzmq.so.5
#8  0x00002aaaab64f1f6 in zsocket_bind () from /lib64/libczmq.so.3
#9  0x000000000040f3ed in bind_child (ep=0x64c190, ov=0x635260)
    at overlay.c:484
#10 overlay_bind (ov=0x635260) at overlay.c:614
#11 0x000000000040a151 in boot_pmi (ctx=0x7fffffffcf50) at broker.c:1208
#12 0x0000000000407479 in main (argc=<optimized out>, argv=<optimized out>)

which I believe is this assertion in src/tcp_address.cpp (reference is to master not 4.1.4)

    //  Get the addresses.
    ifaddrs *ifa = NULL;
    const int rc = getifaddrs (&ifa);
    if (rc != 0 && errno == EINVAL) {
        // Windows Subsystem for Linux compatibility
        LIBZMQ_UNUSED (nic_);
        LIBZMQ_UNUSED (ipv6_);

        errno = ENODEV;
        return -1;
    }
    errno_assert (rc == 0);

Apparently getifaddrs can fail. Since it communicates with the kernel using the netlink socket, I suppose it might run out of something when abused. Although I wouldn't say we're abusing it - merely starting a dozen or so copies of the same zeromq based program at the same time.

Perhaps a backoff-retry would be appropriate here instead of an assertion?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions