Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comment to provoke review of suspect code #982

Closed
wants to merge 1 commit into from

Conversation

daveab
Copy link
Contributor

@daveab daveab commented Apr 25, 2014

see the change I proposed for 3x.

@@ -116,7 +116,8 @@ int zmq::lb_t::sendpipe (msg_t *msg_, pipe_t **pipe_)
more = msg_->flags () & msg_t::more? true: false;
if (!more) {
pipes [current]->flush ();
current = (current + 1) % active;
current = (current + 1) % active; // in zmq 3.x this causes crash if the last pipe is terminated beween the
// the test for active == 0 at 109 and here as %(active==0) causes a crash.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A think we can safely assert that active is non zero here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Martin,

I do not know the zeromq code very well... and the current code not at all.

The question I raise is "Is 'active' modified and read in two different
threads?".
Because I am confident that in zeromq3-x it is modified in two threads and
a race to crash exists as a consequence.

On Fri, Apr 25, 2014 at 4:54 PM, Martin Hurton notifications@github.comwrote:

In src/lb.cpp:

@@ -116,7 +116,8 @@ int zmq::lb_t::sendpipe (msg_t msg, pipe_t *pipe)
more = msg_->flags () & msg_t::more? true: false;
if (!more) {
pipes [current]->flush ();

  •    current = (current + 1) % active;
    
  •    current = (current + 1) % active; //  in zmq 3.x this causes crash if the last pipe is terminated beween the
    
  •                                      //    the test for active == 0 at 109  and here as %(active==0) causes a crash.
    

A think we can safely assert that active is non zero here.


Reply to this email directly or view it on GitHubhttps://github.com//pull/982/files#r12002719
.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code should be executed only in one thread, provided the application does not invoke socket operation in two different threads. That's the assumption of the library.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the threads involved here is the IO thread in the zmq context.

@hintjens
Copy link
Member

@daveab did you manage to provoke the crash in 3.x consistently?

@daveab
Copy link
Contributor Author

daveab commented Apr 25, 2014

about 10 out of twenty runs of the test case.

On Fri, Apr 25, 2014 at 5:19 PM, Pieter Hintjens
notifications@github.comwrote:

@daveab https://github.com/daveab did you manage to provoke the crash
in 3.x consistently?


Reply to this email directly or view it on GitHubhttps://github.com//pull/982#issuecomment-41411197
.

@daveab
Copy link
Contributor Author

daveab commented Apr 25, 2014

#0 zmq::lb_t::validate_thread (this=0x9efa80) at lb.cpp:185
#1 0x00007ffff6c8b802 in zmq::lb_t::terminated (this=0x9efa80,
pipe_=0x9f51a0) at lb.cpp:54
#2 0x00007ffff6c99098 in zmq::push_t::xterminated (this=0x9ef6e0,
pipe_=0x9f51a0) at push.cpp:53
#3 0x00007ffff6ca0e8a in zmq::socket_base_t::terminated (this=0x9ef6e0,
pipe_=0x9f51a0) at socket_base.cpp:1012
#4 0x00007ffff6c95f7e in zmq::pipe_t::process_pipe_term_ack
(this=0x9f51a0) at pipe.cpp:287
#5 0x00007ffff6c90557 in zmq::object_t::process_command (this=0x9f51a0,
cmd_=...) at object.cpp:104
#6 0x00007ffff6ca08d5 in zmq::socket_base_t::process_commands
(this=0x9ef6e0, timeout_=0, throttle_=false) at socket_base.cpp:856
#7 0x00007ffff6ca0c54 in zmq::socket_base_t::in_event (this=0x9ef6e0) at
socket_base.cpp:957
#8 0x00007ffff6c877b2 in zmq::epoll_t::loop (this=0x9db2b0) at
epoll.cpp:162
#9 0x00007ffff6c87898 in zmq::epoll_t::worker_routine (arg_=0x9db2b0) at
epoll.cpp:175
#10 0x00007ffff6ca98e8 in thread_routine (arg_=0x9db320) at thread.cpp:83
#11 0x00007ffff4d28f6e in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007ffff42359cd in clone () from /lib/x86_64-linux-gnu/libc.so.6

As this stack shows.. zmq::lb_t::terminated() is called from the internal
zmq worker thread, resulting in 'active--', unsynchronised with the tets in
send().

Simply adding the code to get the thread_id and compare with previously
seen thread-id broke my race.

It is a very small target.

Dave

On Fri, Apr 25, 2014 at 5:25 PM, David Brown david.brown@datasift.comwrote:

about 10 out of twenty runs of the test case.

On Fri, Apr 25, 2014 at 5:19 PM, Pieter Hintjens <notifications@github.com

wrote:

@daveab https://github.com/daveab did you manage to provoke the crash
in 3.x consistently?


Reply to this email directly or view it on GitHubhttps://github.com//pull/982#issuecomment-41411197
.

@benjamg
Copy link
Member

benjamg commented Apr 26, 2014

Although this change would fix the race condition over changing active to zero it also implies its possible to do a flush on a pipe we have erased.

@hintjens
Copy link
Member

hintjens commented May 2, 2014

I'm not sure what the conclusion was here, but as the pull request doesn't solve a clear issue afaics, I'm closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants