Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in zmq::msg_t::set_metadata when using a PUB/SUB #3826

Closed
GauravAccGit opened this issue Feb 12, 2020 · 9 comments
Closed

segfault in zmq::msg_t::set_metadata when using a PUB/SUB #3826

GauravAccGit opened this issue Feb 12, 2020 · 9 comments

Comments

@GauravAccGit
Copy link

Hi,

We are using libzmq in one of our projects. The application is having two processes where one process is sending messages to another process using ZMQ PUB-SUB model. Both process code are in C. While running high traffic, there is a segmentation fault in zmq::msg_t::set_metadata in the process receiving messages

  • libzmq version: 4.1.5
  • OS: Linux 4.14.35-1902.3.2.el7uek.x86_64

Minimal test code / Steps to reproduce the issue

Segmentation fault comes only when there are large number of messages getting exchanged at high speed. There is no exact steps to reproduce - not sure if this will be a blocking factor to analyze. I see that the same crash was reported in Issue 1419 also: #1419
Here, it was mentioned that crash was probably fixed in libzmq version 4.1.1 and there was a metadata fix also. But, then in version 4.1.5, it shouldn't come

What's the actual result? (include assertion message & call stack if applicable)

Application process receiving the messages crashes, below is the bt:

#0 0x00007ff39409abce in zmq::msg_t::set_metadata (this=this@entry=0x7ff2a2a280e8, metadata_=0x7ff2a2a13040) at src/msg.cpp:303
#1 0x00007ff3940bafbe in zmq::stream_engine_t::decode_and_push (this=0x7ff2a4e24f00, msg_=0x7ff2a2a280e8) at src/stream_engine.cpp:891
#2 0x00007ff3940bc4d3 in zmq::stream_engine_t::in_event (this=0x7ff2a4e24f00) at src/stream_engine.cpp:323
#3 0x00007ff394093cfe in zmq::epoll_t::loop (this=0x7ff38d42b040) at src/epoll.cpp:176
#4 0x00007ff3940c3100 in thread_routine (arg_=0x7ff38d42b0c0) at src/thread.cpp:96
#5 0x00007ff392506ea5 in start_thread () from /lib64/libpthread.so.0
#6 0x00007ff390c768cd in clone () from /lib64/libc.so.6

What's the expected result?

There shouldn't be any crash

@bluca
Copy link
Member

bluca commented Feb 12, 2020

Please try with the latest release, 4.1.x is very old

@bluca
Copy link
Member

bluca commented Feb 12, 2020

If it still reproduces, please attach a minimal test case to reproduce it

@GauravAccGit
Copy link
Author

Thanks Luca for quick response. Will try with 4.3.2 and update

@GauravAccGit
Copy link
Author

GauravAccGit commented Feb 16, 2020

Hi,

I tried with libzmq 4.3.2 but there are still crashes. However, there is no core file being generated even though I have enabled cores. Just getting below messages in /var/log/messages:

Feb 15 02:49:06 kernel: [32156]: segfault at 600005d87 ip 00007fdeacdc272b sp 00007fde9c9d7ff0 error 4 in libzmq.so.5.2.2[7fdeacda5000+96000]

Feb 15 18:10:02 kernel: ZMQbg/IO/14[53744]: segfault at 0 ip 00007fee4ac173b8 sp 00007fee31b6d820 error 4 in libzmq.so.5.2.2[7fee4abee000+96000]

I have really no idea why cores are not getting generated because with libzmq version 4.1.5 it was getting generated, I have followed the exact same procedure to enable cores. With 4.1.5 version, I was seeing below messages in /var/log/messages and everytime I see those messages, there was a core file being generated:

Feb 11 05:56:44 kernel: [77205]: segfault at 0 ip 00007fd918054cf8 sp 00007fd8ff56d830 error 4 in libzmq.so.5.0.1[7fd918036000+6c000]

As I mentioned earlier, there are no fix steps to reproduce the issue: It is coming only when I run high traffic and my application is receiving high number of messages, say 50K messages per second. With few messages per second being received, there is no core. I have two ZMQ_SUB sockets opened and am doing a zmq_poll() on those sockets and there is only one application thread which is reading the poll events from socket. The crash comes even if the messages are being received only on one subscriber socket.

Please let me know if any other information is required. Kindly provide some inputs here, this is becoming critical for us

Thanks,
Gaurav

@bluca
Copy link
Member

bluca commented Feb 16, 2020

There is nothing specific to do for core files, it's exactly the same as any other Linux application. Make sure your build process is working as intended.

@bluca
Copy link
Member

bluca commented Feb 16, 2020

If you are building libzmq from scratch rather than using existing packages, ensure you are building with debugging symbols enabled

@GauravAccGit
Copy link
Author

Yes, we build the libzmq from scratch. Can you please share the steps to enable debugging symbols.
One more thing, will it be possible for you or someone else to just have a look at our sample code and share inputs if we are doing anything wrong

@bluca
Copy link
Member

bluca commented Feb 17, 2020

Debug symbols are enabled like with any other Linux application/library you are compiling, nothing specific to do.

If you provide here a minimal, self-contained, working example that reproduces the issue I will run it and check.

@stale
Copy link

stale bot commented Jun 11, 2021

This issue has been automatically marked as stale because it has not had activity for 365 days. It will be closed if no further activity occurs within 56 days. Thank you for your contributions.

@stale stale bot added the stale label Jun 11, 2021
@stale stale bot closed this as completed Aug 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants