-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash when using multicast function #3218
Comments
Can you share a simple example to reproduce? Are you sure that a socket is not being used from multiple threads? This includes create and close |
To reproduce this situation is not easy. It has something to do with data, and even the frequency of CPU. This is just a undefined behavior that happens now and then. We did some testing for pgm_receiver_t::in_event() function, and we found that in_event could be called even if there is a pending restart_input call. This is the problem that caused the crash, because inpos could become nullptr in in_event function call. In this situation, the system would crash. If we are lucky that inpos does not become null, its value could be changed. restart_input wont get the intended result. The reason for this conflict is that though in_event stops polling, it does not stops triggering. Some events are already polled, and about to trigger in_event. Our proposed fix for this problem is the following:
This function now looks like this. The output part at the very beginning is just for our testing purpose, and that message IS printed occasionally. Certainly our change need reviews from the experts in zmq. ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// void zmq::pgm_receiver_t::in_event ()
} |
That looks fine, feel free to send a PR |
I am not quite familiar with Github yet. Could you help me create a PR for this? @bluca |
In short you need to fork this repository, clone it, branch it, commit the changes and push them, and then open a pull request on the website. But there are plenty tutorials available to be found on google, better than anything I could write. |
Fixed by #3226 |
Please use this template for reporting suspected bugs or requests for help.
Issue description
System crash occasionally when using OpenPGM for multicast. We did some test and found that pgm_receiver_t::inpos becomes a nullptr, and it is passed to memcpy() as the second argument(pointer to source memory). That caused the crash. Call stack is provided below.
We suspect this is a threading issue. Are pgm_receiver_t::in_event() and pgm_receiver_t::restart_input() called simultaneously in different thread?
Environment
Minimal test code / Steps to reproduce the issue
This does not always happen. It happens more often when working load is heavy.
What's the actual result? (include assertion message & call stack if applicable)
This call stack is from version 4.0.5. After we upgraded to 4.2.3, we still get the same issue.
(gdb) bt
#0 0x0000003fe348995e in memcpy () from /lib64/libc.so.6
#1 0x00007f942b6b04a9 in zmq::decoder_base_tzmq::v1_decoder_t::decode(unsigned char const*, unsigned long, unsigned long&) () at decoder.hpp:119
#2 0x00007f942b68823c in zmq::pgm_receiver_t::process_input(zmq::v1_decoder_t*) () at pgm_receiver.cpp:261
#3 0x00007f942b687a69 in zmq::pgm_receiver_t::restart_input() () at pgm_receiver.cpp:124
#4 0x00007f942b695bb5 in zmq::session_base_t::write_activated(zmq::pipe_t*) () at session_base.cpp:260
#5 0x00007f942b68d9e9 in zmq::pipe_t::process_activate_write(unsigned long) () at pipe.cpp:233
#6 0x00007f942b682d2d in zmq::object_t::process_command(zmq::command_t&) () from /opt/yadev/3rdParty/cpp/ZeroMQ/4.0.5/lib/libzmq.so.4
#7 0x00007f942b6789da in zmq::io_thread_t::in_event() () at io_thread.cpp:73
#8 0x00007f942b676c3d in zmq::epoll_t::loop() () at epoll.cpp:165
#9 0x00007f942b676cfa in zmq::epoll_t::worker_routine(void*) () at epoll.cpp:178
#10 0x00007f942b6a78be in thread_routine () from /opt/yadev/3rdParty/cpp/ZeroMQ/4.0.5/lib/libzmq.so.4
#11 0x0000003fe3c079d1 in start_thread () from /lib64/libpthread.so.0
#12 0x0000003fe34e8b6d in clone () from /lib64/libc.so.6
What's the expected result?
System does not crash.
The text was updated successfully, but these errors were encountered: