Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDP engine crashes and exhibits weird behaviour #2009

Open
hitstergtd opened this issue May 23, 2016 · 10 comments
Open

UDP engine crashes and exhibits weird behaviour #2009

hitstergtd opened this issue May 23, 2016 · 10 comments

Comments

@hitstergtd
Copy link
Member

hitstergtd commented May 23, 2016

I came across these as part of adding throughput and latency tests. I maybe doing something messed, which is leading to these results, although I am fairly certain it's nothing unreasonable. :)

About the code:
The linked gist is basically a variant of the inproc_thr benchmark:

  • It is modified to use RADIO/DISH sockets to send and receive messages over topic called thr_test
  • Use of a separate sender thread was removed from this example. It basically sends N messages on the topic/group and then picks them up later
  • The parameters are exactly like inproc_thr, i.e. message-size and message-count

What works:

  • message-count <= 1000 AND message-size >= 0 AND message-size <= 8183

What does NOT work:

  • hangs indefinitely:
    • message-count > 1000 AND message-size > 0 (1 in 10 chance of a core dump)
    • message-count = 4500 AND message-size = 999 (sometimes, or dumps core as per below)
  • always dumps core:
    • message-count >= 1 AND message-size = 50000
  • dumps core (randomly) / double-free corruption message:
    • message-count = 1 AND message-size = 16413 (try running this repeatedly)
    • message-count = 4499 AND message-size = 999
  • message of incorrect size received:
    • message-count = 1000 AND message-size = 8184
    • message-count = 1000 AND message-size = 16413 (but one message dumps core!)

Code to reproduce this is available at:
https://gist.github.com/hitstergtd/68503600e353adb3155504982df54682.

Not sure if these existed since day 1 or progressively over the last few months. I haven't tried the above scenarios on Windows, only on Linux 4.4.0-22 kernel (Ubuntu latest), so not sure if they're somehow quite poller dependent or not.

@somdoron
Copy link
Member

Regarding the message size, UDP is currently limited to 8191 bytes
(including the topic). I will try to figure out the rest.
On May 24, 2016 00:40, "Hitster GTD" notifications@github.com wrote:

I came across these as part of adding throughput and latency tests. I
maybe doing something messed, which is leading to these results, although I
am fairly certain it's nothing unreasonable. :)

About the code:
The linked gist is basically a variant of the inproc_thr benchmark:

  • It is modified to use RADIO/DISH sockets to send and receive
    messages over a single topic, aptly called thr_test.
  • Use of a separate sender thread was removed from this example. It
    basically sends N messages on the topic/group and then picks them up later.
  • The parameters are exactly like inproc_thr, i.e. message-size and
    message-count.

What works:

  • message-count <= 1000 AND message-size >= 0 AND message-size <= 8183

What does NOT work:

  • hangs indefinitely:
    • message-count > 1000 AND message-size > 0 (1 in 10 chance of a
      core dump)
    • message-count = 4500 AND message-size = 999 (sometimes, or dumps
      core as per below)
  • always dumps core:
    • message-count >= 1 AND message-size = 50000
  • dumps core (randomly) / double-free corruption message:
    • message-count = 1 AND message-size = 16413 (try running this
      repeatedly)
    • message-count = 4499 AND message-size = 999
  • message of incorrect size received:
    • message-count = 1000 AND message-size = 8184
    • message-count = 1000 AND message-size = 16413 (but one message
      dumps core!)

Code to reproduce this is available at:
https://gist.github.com/hitstergtd/68503600e353adb3155504982df54682.

Not sure if these existed since day 1 or progressively over the last few
months. I haven't tried the above scenarios on Windows, only on Linux
4.4.0-22 kernel (Ubuntu latest), so not sure if they're somehow quite
poller dependent or not.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#2009

@somdoron
Copy link
Member

Becuase you srnd all and then receive message count larger than high watermark of subscriber will cause messages to drop, as UDP is unreliable, watermark is reached, internal buffers get full and new messages will get drop.

@somdoron
Copy link
Member

Can you attach the stack trace of the assert?

@hitstergtd
Copy link
Member Author

@somdoron

  • OK, it maybe worth noting the limitation in zmq_udp manual page until it changes otherwise
  • It's not always an assert that is triggered, sometimes it is a segmentation fault
  • Hmm, reaching HWM explains the hang, but doesn't explain why it crashes abruptly and/or the warnings about double-free/corruption; especially when sending a message > internal limit, which dumps core, ie. sending and receiving one message of 16413 bytes. I know there is a warning in the udp_engine code about checking message sizes.

I will send the stack traces as soon as possible.

@hitstergtd
Copy link
Member Author

hitstergtd commented May 23, 2016

Also - for UDP - sending/receiving message where (topic-length + message-size) is greater than 8191 bytes should throw an error at the API level, if it's not supported; unless I am missing something!

@somdoron
Copy link
Member

@hitstergtd I will take a look next week, regarding message size, it is a simple code need fixing. at the time is was less important.

Regarding the crashing, do you a stack trace on where it is crashing?

@hitstergtd
Copy link
Member Author

@somdoron No problem - I just thought I would report it so that's it's hopefully fixed for the 4.2 release, as I see that being one of the important features. I also wanted to see throughout and latency numbers for UDP transport to see if it fares any better than the TCP stream engine.

I will generate a stack trace in next couple of days and put it up here. Do you need it for all crash scenarios or just one of them?

@somdoron
Copy link
Member

we can start with one of them

@StephanOpfer
Copy link

StephanOpfer commented Sep 9, 2019

I still get a core-dump in case of messages larger than the aforementioned ~16413 Bytes. Is help appreciated? I can provide minimal working examples that produce this issue.

@somdoron
Copy link
Member

hey @StephanOpfer, do you have the backtrace of the core-dump?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment