-
Notifications
You must be signed in to change notification settings - Fork 770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hang in ROS2 Fast-RTPS during destruction [3094] #235
Comments
I see a similar problem without ROS2 on Windows 10 x64. On Win7 the same executable (build from source on the particular platform) did not hand during shutdown. Even if subscribers are removed before publishers, removing (the one and only) participant, there are one or both AsyncWriterThread threads active (not suspended/waiting). It looks like that the abort messages send via UDP gets not received while AsyncWriterThread threads are active. Minutes later the abort messages get received, the 3 blocking receive threads gets done and the join method returns. I monitored the UDP communication via Wireshark and it lists the UDP messages (length=13) also minutes later when the join returns. |
We cannot reproduce this behavior. New release 1.7.0 comes with changes in the network transport layers. Can you test this behavior still occurs? Thanks |
Is this issue solved with ROS2 Crystal release? |
Sorry, I haven't had time to test this out again yet. I'll try to find some time in the next couple of days. |
I've just tried to recreate the issue as described here ros2/examples#209 under ROS 2 Dashing release and the issue persists. |
@ssnover95 our develop branch includes PR #540, which fixes a data race when destroying the participant that may be the cause of the problems you have (both the hung and the crash). Could you check using the develop branch of Fast-RTPS? |
@clalancette @ssnover95 In a couple of days we are going to release 1.8.1 (see #574), which includes #569 that fixes some hang cases when destroying a participant. Could you check if this issue is fixed with those changes? |
@clalancette @ssnover95 Can we close this issue? |
I'm going to close this out, as I haven't seen this particular issue in a while. If it comes up again, I'll re-open. Thanks. |
I'm still debugging this, so I don't have all of the information. Nonetheless, this problem looks like it may be in Fast-RTPS, so opening this issue to get some visibility and maybe some guidance.
I'm currently debugging a failure in https://github.com/ros2/examples/blob/master/rclcpp/minimal_subscriber/not_composable.cpp ; the initial report is here: ros2/examples#209 . Running that code as-is causes the error message from that other issue.
While looking at the code with @wjwwood , however, we realized that this line is probably the culprit: https://github.com/ros2/examples/blob/master/rclcpp/minimal_subscriber/not_composable.cpp#L38 . That is, we force the node to be destroyed before the subscription (which will get destroyed when it goes out of scope). One easy solution is to just remove line 38. However, when I do that, the node hangs when I hit Ctrl-C. After doing some debugging in gdb, I see the following:
(I've elided the rest of the threads for brevity). It looks like what is happening is that
AsyncWriterThread::run
in Thread 6 is currently waiting to be woken up from thecv_
condition variable, implying it is holding the lock. Thread 1 looks to be trying to destroy the condition variable, but it is attempting to take the lock first, and this deadlocks. I'm still looking into this, but any thoughts or advice welcome. @richiware FYI.The text was updated successfully, but these errors were encountered: