-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fixed race condition in action server between is_ready and take"… #2531
Conversation
@jmachowinski I just tested it on my local setup and it worked fine so far. I will deploy it on our machine and checkout if it fixes the race condition issue. |
0221344
to
d0610d6
Compare
Hm, we run into a problem with this backport, as #2109 is one of the causes of this bug, and can not be reverted without API break. As a workaround, I would propose to remove the exception, if take_data is called multiple times, and make execute() handle this case as well. Before going forward with this, I would like to hear you opinions on this @clalancette @wjwwood @mjcarroll |
@jmachowinski Apparently this back port also does not fix the issue completely.
|
Yes, this is the problem that I was talking about above. |
Only for the backport or for |
Only for backport to Iron, for humble this patch should work as it is right now. |
Whats the state with this PR? @mjcarroll @clalancette |
d0610d6
to
469f5cd
Compare
@firesurfer I updated the patch, to ignore the double calls to take_data, can you test it ? @clalancette @mjcarroll @wjwwood As this PR has been stale for some time without any response from you the attendees of the Client Working Group Meeting decided, that we will go for just ignoring the double calls to take_data and go on with this PR. |
@jmachowinski Thanks for the patch. I'm rolling it out to our machine right now and will give you feedback this week. |
@firesurfer one second, I missed a part.... |
35523a5
to
5f856b6
Compare
Fixed, @firesurfer sorry for the confusion, now the patch should be good to go |
@jmachowinski I guess the issue is fixed on the server side in rclcpp with this fix. On the client side with rclpy we still get a related error. But as I said my guess is that this is not related wit hthe action server side (ros2control - jtc) in this case.
EDIT: It seems this is also a race condition but in the action implementation in rclpy: |
…ros2#2495) Some background information: is_ready, take_data and execute data may be called from different threads in any order. The code in the old state expected them to be called in series, without interruption. This lead to multiple race conditions, as the state of the pimpl objects was altered by the three functions in a non thread safe way. Co-authored-by: William Woodall <william@osrfoundation.org> Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
This adds a workaround for a known bug in the executor in iron. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
5f856b6
to
586c44c
Compare
@alsora @fujitatomoya I guess we are good to go. Can one of you run the CI ? |
@alsora I think you started the CI for the wrong OS / Distro... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with green CI
Do we not need the test fix https://github.com/ros2/rclcpp/pull/2495/files#diff-b5e37eb4f1e19276cc541af7323bb55cfc86b1e7b2e0fa1f9201f5436d73b65a?
I believe that this can keep the ABI compatibility but applying ABI compliance checker would be nice. |
The builds failed due to a strange error in The gist was created from https://github.com/ros2/ros2/blob/iron/ros2.repos and it seems correct to me. |
Here (https://ci.ros2.org/job/ci_linux/21319/) it says: Therefore I assumed the CI is off, as ubuntu should be Jammy and ros_disto: Iron... |
@alsora @fujitatomoya merge ? |
RHEL shouldn't be failing, but it looks like an infrastructure issue. I'll kick it off again. |
@alsora CI is green, i am okay to merge this. can you merge this with your lgtm? |
Yes, the changes look good to me |
ros2#2531) * fix: Fixed race condition in action server between is_ready and take" (ros2#2495) Some background information: is_ready, take_data and execute data may be called from different threads in any order. The code in the old state expected them to be called in series, without interruption. This lead to multiple race conditions, as the state of the pimpl objects was altered by the three functions in a non thread safe way. Co-authored-by: William Woodall <william@osrfoundation.org> Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> * fix: added workaround for call to double calls to take_data This adds a workaround for a known bug in the executor in iron. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> --------- Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> Co-authored-by: Janosch Machowinski <J.Machowinski@cellumation.com> Co-authored-by: William Woodall <william@osrfoundation.org>
ros2#2531) * fix: Fixed race condition in action server between is_ready and take" (ros2#2495) Some background information: is_ready, take_data and execute data may be called from different threads in any order. The code in the old state expected them to be called in series, without interruption. This lead to multiple race conditions, as the state of the pimpl objects was altered by the three functions in a non thread safe way. Co-authored-by: William Woodall <william@osrfoundation.org> Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> * fix: added workaround for call to double calls to take_data This adds a workaround for a known bug in the executor in iron. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> --------- Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> Co-authored-by: Janosch Machowinski <J.Machowinski@cellumation.com> Co-authored-by: William Woodall <william@osrfoundation.org>
* Fixes for intra-process actions (#144) * Fixes for intra-process Actions * Fixes for Clang builds * Fix deadlock * Server to store results until client requests them * Fix feedback/result data race See ros2#2451 * Add missing mutex * Check return value of intra_process_action_send --------- Co-authored-by: Mauro Passerino <mpasserino@irobot.com> * Fix IPC Actions data race (#147) * Check if goal was sent through IPC before send responses * Add intra_process_action_server_is_available API to intra-process Client --------- Co-authored-by: Mauro Passerino <mpasserino@irobot.com> * Fix data race in Actions: Part 2 (#148) * Fix data race in Actions: Part 2 * Fix warning - copy elision --------- Co-authored-by: Mauro Passerino <mpasserino@irobot.com> * fix: Fixed race condition in action server between is_ready and take"… (ros2#2531) * fix: Fixed race condition in action server between is_ready and take" (ros2#2495) Some background information: is_ready, take_data and execute data may be called from different threads in any order. The code in the old state expected them to be called in series, without interruption. This lead to multiple race conditions, as the state of the pimpl objects was altered by the three functions in a non thread safe way. Co-authored-by: William Woodall <william@osrfoundation.org> Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> * fix: added workaround for call to double calls to take_data This adds a workaround for a known bug in the executor in iron. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> --------- Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> Co-authored-by: Janosch Machowinski <J.Machowinski@cellumation.com> Co-authored-by: William Woodall <william@osrfoundation.org> --------- Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> Co-authored-by: Mauro Passerino <mpasserino@irobot.com> Co-authored-by: jmachowinski <jmachowinski@users.noreply.github.com> Co-authored-by: Janosch Machowinski <J.Machowinski@cellumation.com> Co-authored-by: William Woodall <william@osrfoundation.org>
@jmachowinski Has this fix already been added as a patch in Humble? I continue to encounter this race bug in Humble on Ubuntu 22. |
Nope, feel free to cherry pick the correct version and open a PR for it. |
… Backport from iron ros2#2531
… Backport from iron ros2#2531
… Backport from iron ros2#2531 Signed-off-by: Camilo Camacho <camilo.im93@gmail.com>
… Backport from iron ros2#2531 Signed-off-by: Camilo Camacho <camilo.im93@gmail.com>
… Backport from iron ros2#2531 Signed-off-by: Camilo Camacho <camilo.im93@gmail.com> Co-authored-by: Janosch Machowinski <J.Machowinski@cellumation.com>
… (#2495)
Some background information: is_ready, take_data and execute data may be called from different threads in any order. The code in the old state expected them to be called in series, without interruption. This lead to multiple race conditions, as the state of the pimpl objects was altered by the three functions in a non thread safe way.
This is a clean backport of #2495. This should superseed #2530
Note, this patch is not tested.
@firesufer can you try out this patch ?