fuse -> ROS 2 fuse_core: Fix Async #293
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See: #276
Description
There are several outstanding issues with the async model used in fuse_core:
executor->cancel()call and spin forever.I'm a little iffy on if that's what's happening with (2), but regardless, the solutions implemented in this PR fix both issues.
The solution is to keep triggering the guard condition on hooking of the callback queues until the callback queues are serviced.
I also edited the tests to be a
little bita lot stricter and now test for multiple callback additions and initializations.The tests now feature async classes with executor thread count of 0 (which means the number of threads will equal the number of CPU cores), as troublesome a case as possible to test, to ensure my solutions are as robust as possible.
I had to do this because a lot of the issues seem to be race condition related and come up extremely rarely.
Notes
There was an alternative solution that I explored (which also worked), which was to use the timeout argument for the multi-threaded executor. This removes any chance of the multi-threaded executor deadlocking on waitables, but at the cost of free spinning at the rate of the timeout.
If we encounter deadlock or threading issues down the line, we might have to use that solution instead.
initialized_atomic to become true. I can't reproduce it though... Is there a chance for callbacks to get dropped??Update
EDIT: I pulled out all the stops and used all the solutions I found, all at once. I do not think I understand the problem enough to come up with an elegant solution, as for some reason all the "good practices" (like locking on threads that notify condition variables), or relying on condition variable waits instead of sleeps all result in deadlocks or weird behavior like atomic flags not getting set when the block that sets them ostensibly runs (???).
The solution I've devised combines the use of condition variables and manual guard condition triggers (just for
initialize()!!), and also a lower frequency executor wakeup.On the other hand, I'm able to spin up and down a massive amount of async instances sequentially now, when before it'd randomly block indefinitely, so it might just be good enough (I updated the tests to check for that case specifically.)? I have yet to see it fail, but I am not confident that it will never fail, since I never understood why it failed in the first place.
Some refinement might be helpful though, maybe I'm just not seeing something obvious.
Pinging @svwilliams for visibility.