Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using solarflare onload with epgm fails an assert in src/session_base.cpp #4319

Open
nkarnev opened this issue Dec 19, 2021 · 1 comment
Open

Comments

@nkarnev
Copy link

nkarnev commented Dec 19, 2021

Issue description

Using onload prior to running a binary which uses epgm protocol to create a ZMQ_PUB socket results in an assert failure despite zmq_bind returning 0 (no error). I suspect that onload hijacks the socket api and somehow changes the parameters being passed around. This does not appear to be a problem if TCP protocol is used with ZMQ_PUB.

The Error message.
oo:pgm_publisher[54637]: Using OpenOnload 201710-u1.1 Copyright 2006-2018 Solarflare Communications, 2002-2005 Level 5 Networks [7]
oo:pgm_publisher[54637]: Importing OpenOnload 201710-u1.1 Copyright 2006-2018 Solarflare Communications, 2002-2005 Level 5 Networks [0,c54637-c0]
Invalid argument (src/session_base.cpp:723)
Aborted

Below are lines 721-723
int rc =
pgm_sender->init (udp_encapsulation, _addr->address.c_str ());
errno_assert (rc == 0);

Environment

g++ 9.2.0

  • libzmq version (commit hash if unreleased): Version of zmq: 4.3.4
  • OS: CentOS7

Minimal test code / Steps to reproduce the issue

In addition to the code below, one has to start the resulting binary using onload which also requires a solarflare NIC. Understandingly that is quite limiting.

inline static void zmq_version_used() {
int major, minor, patch;
zmq_version(&major, &minor, &patch);
fprintf(stdout, "Current 0MQ version is %d.%d.%d\n", major, minor, patch);
}

int main() {
void * context = zmq_ctx_new();
assert(context);
zmq_version_used();
void *pub = zmq_socket(context, ZMQ_PUB);
int ttl = 30;
int socket_opt_rc = zmq_setsockopt(pub,ZMQ_MULTICAST_HOPS, &ttl,sizeof(int));
if (socket_opt_rc == -1) fprintf(stderr, "Failed to set multicast hops: %s\n", zmq_strerror(errno));

int rc = zmq_bind(pub, "epgm://interface;address_group:port");
if (rc == -1) fprintf(stderr, "Some error %s\n", zmq_strerror(errno));
else fprintf(stdout, "Bound socket\n");
while (1) {
zmq_send(pub, "TEST",4,0);
}
return 0;
}

What's the actual result? (include assertion message & call stack if applicable)

(gdb) bt
#0 0x00007ffff6ca1277 in GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff6ca2968 in GI_abort () at abort.c:90
#2 0x00000000004058c0 in zmq::zmq_abort (errmsg
=errmsg
@entry=0x7ffff6df299d "Invalid argument") at src/err.cpp:88
#3 0x000000000044ca3f in zmq::session_base_t::start_connecting(bool) () at src/session_base.cpp:734
#4 0x000000000041b483 in zmq::object_t::process_command (this=0x6ea130, cmd
=...) at src/object.cpp:87
#5 0x0000000000417a5c in zmq::io_thread_t::in_event (this=0x6e7e60) at src/io_thread.cpp:91
#6 0x0000000000416e06 in zmq::epoll_t::loop (this=0x6e8400) at src/epoll.cpp:206
#7 0x000000000042f8a1 in thread_routine (arg
=0x6e8458) at src/thread.cpp:402
#8 0x00007ffff7932e25 in start_thread (arg=0x7ffff5e4a700) at pthread_create.c:308
#9 0x00007ffff6d69bad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) quit

What's the expected result?

Publisher to start broadcasting data

@ljluestc
Copy link

#include <zmq.h>
#include <cassert>
#include <cstdio>
#include <cstring>
#include <unistd.h>

// Function to check the version of ZeroMQ
inline static void zmq_version_used() {
    int major, minor, patch;
    zmq_version(&major, &minor, &patch);
    fprintf(stdout, "Current 0MQ version is %d.%d.%d\n", major, minor, patch);
}

int main() {
    // Create a ZeroMQ context
    void *context = zmq_ctx_new();
    assert(context);

    // Display the ZeroMQ version
    zmq_version_used();

    // Create a PUB socket
    void *pub = zmq_socket(context, ZMQ_PUB);
    assert(pub);

    // Set multicast hops
    int ttl = 30;
    int socket_opt_rc = zmq_setsockopt(pub, ZMQ_MULTICAST_HOPS, &ttl, sizeof(int));
    if (socket_opt_rc == -1) {
        fprintf(stderr, "Failed to set multicast hops: %s\n", zmq_strerror(errno));
    }

    // Bind the socket using the EPGM protocol
    int rc = zmq_bind(pub, "epgm://interface;address_group:port");
    if (rc == -1) {
        fprintf(stderr, "Some error %s\n", zmq_strerror(errno));
        zmq_close(pub);
        zmq_ctx_destroy(context);
        return 1;
    } else {
        fprintf(stdout, "Bound socket\n");
    }

    // Publish data indefinitely
    while (1) {
        zmq_send(pub, "TEST", 4, 0);
        usleep(1000000); // Sleep for 1 second
    }

    // Clean up
    zmq_close(pub);
    zmq_ctx_destroy(context);

    return 0;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants