-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Threads should have their own signal handler #192
Comments
Well, I'm not sure that's the right fix for this. Instead, maybe there should be a thread-agnostic signal handler in the Python layer that acts like the rclcpp signal handler which triggers the sigint guard condition for all wait sets:
So to recap, I think the issue is that I'm not sure it ever make sense to have one "stored signal handler to be restored" for each thread. I did mention that when we were talking in person, but after thinking about it more, I'm not sure that would work correctly. |
I think we're getting burned by this right now. We have a python process with multiple nodes and multiple executors on different threads, but our process is still blowing up with
Apparently in c++ it's OK to run multiple executors on multiple threads, but in python we're being prevented from doing this by the global g_wait_set_spinning bool defined in the module https://github.com/ros2/rclpy/blob/master/rclpy/rclpy/executors.py#L34 We can work around this by having a single worker thread pump messages, but this will mean needing additional synchronization whenever we want to access data received by a subscriber from the 'main' thread |
I think the following might work:
I'll go ahead and implement this, but feel free to add any feedback in |
@wjwwood I've been working on this, but then I realized that I'd have to call |
@esteve sorry, I haven't had time to try and wrap my head around it. I'll get back to you asap. |
I don't fully follow your second comment about needing to call it twice, but from the bullet list proposal, I'd wonder if it actually solves the issue, because if a waitset is waiting and has not work, but another waitset is waiting on it to finish before actually waiting despite the fact that it has data available, then you'd starve a ready waitset with a blocking waitset. Which effectively doesn't let you wait from two threads at the same time, if I understand it correctly. Perhaps I don't have a grasp on the solution you've proposed yet. I'm happy to keep discussing it, but a pr, even a partial one, might be a more concrete thing to discuss. Sorry for taking so long to look at it. |
@wjwwood I pushed master...esteve:thread_safe_rclpy_wait but I'm not sure if that's the correct approach. @pbaughman do you have a test or a script that fails? I'd like to check if my branch fixed the bug. Thanks. |
@esteve - I do not - I'm actually not exactly sure what subtle failures were seen before the big scary runtime error Maybe something like this: import threading
import time
import rclpy
import rclpy.executors
import std_msgs.msg
class ThreadSpinner(threading.Thread):
def __init__(self, node_name):
super().__init__()
self._node = rclpy.create_node(node_name)
self._sub = self._node.create_subscription(
std_msgs.msg.String,
"/chatter",
self._cb
)
def run(self):
executor = rclpy.executors.SingleThreadedExecutor()
try:
executor.add_node(self._node)
while rclpy.ok():
executor.spin_once(1.1)
finally:
executor.remove_node(self._node)
def _cb(self, msg):
print("{} got {}".format(self._node.get_name(), msg.data))
rclpy.init()
spinner1 = ThreadSpinner("Node1")
spinner2 = ThreadSpinner("Node2")
spinner1.start()
spinner2.start()
try:
while True:
time.sleep(100)
except KeyboardInterrupt:
rclpy.shutdown() terminal 1> ros2 run demo_nodes_py talker
terminal 2> python3 script.py
# Expected output:
Node1 got Hello World: 10
Node2 got Hello World: 10
Node1 got Hello World: 11
Node2 got Hello World: 11
Node1 got Hello World: 12
Node2 got Hello World: 12
# etc. . . Would probably want to run it for a while to make sure both nodes continue to receive data |
It is to do with the management of the signal handlers from within the C extension, as the description of the issue says:
So, running it for a long time is not the issue, I think instead you'd need to run it, sigint it, run it again, in a loop until you see (or do not see) some failure or deadlock. |
Before
rcl_wait
is called, an existing signal handler is stored so that it can be called from within rclpy's signal handler. The existing signal handler is restored after waking up.This is not currently robust to
rclpy_wait
being called from multiple threads; each thread ought to store its own signal handler to restore.The text was updated successfully, but these errors were encountered: