Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix actions feedback race #2612

Open
wants to merge 5 commits into
base: rolling
Choose a base branch
from

Conversation

mauropasse
Copy link
Collaborator

Fixes issue mentioned in #2451.

When executing the ActionClient, actions feedback & status had predominance over the action results, so if the client spins slower than the server feedback's rate, it will never process the action result (it will be busy processing feedbacks).

Also adding unit test to verify the fix works. The unit test also tests many other actions interactions.

Mauro Passerino added 2 commits August 27, 2024 12:19
Signed-off-by: Mauro Passerino <mpasserino@irobot.com>
 - See ros2#2451

Signed-off-by: Mauro Passerino <mpasserino@irobot.com>
test_info = std::make_shared<TestInfo>();
rclcpp::init(0, nullptr);
auto p = GetParam();
std::cout << "Test permutation: "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#include <iostream>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 13a27f6

<< (p.use_client_ipc ? "IPC Client }" : "Non-IPC Client }") << std::endl;

executor = test_info->create_executor(p.use_events_executor);
executor_thread = std::thread([&]() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include <thread>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 13a27f6

rclcpp::shutdown();
}

rclcpp::Executor::UniquePtr executor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include <memory>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 13a27f6

executor->add_node(server_node);
executor->add_node(client_node);

bool server_available = action_client->wait_for_action_server(std::chrono::seconds(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include <chrono>

// See the License for the specific language governing permissions and
// limitations under the License.#include <gtest/gtest.h>

#pragma once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use include guards

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 13a27f6

rclcpp_action::GoalResponse
handle_goal(
const rclcpp_action::GoalUUID & uuid,
std::shared_ptr<const Fibonacci::Goal> goal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include <memory>

}

bool result_is_correct(
std::vector<int> result_sequence,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include <vector>

return false;
}

for (size_t i = 0; i < result_sequence.size(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (size_t i = 0; i < result_sequence.size(); i++) {
for (size_t i = 0; i < result_sequence.size(); ++i) {


private:
GoalHandleSharedPtr server_goal_handle_;
std::atomic<bool> result_cb_called{false};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include <atomic>


auto result_future = action_client->async_get_result(goal_handle);
auto result_response_wait = result_future.wait_for(std::chrono::seconds(5));
ASSERT_TRUE(result_response_wait == std::future_status::ready) << "Cancel result response not on time";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include <future> and probably linters will fail here, this line has more then 100 characters

@fujitatomoya
Copy link
Collaborator

this PR includes #2471 with latest code base, so i will close #2471.

rclcpp_action/src/client.cpp Show resolved Hide resolved
this->server_goal_handle_->abort(result);
}

// Server: Handle goal callback
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this comment? other places do not have this kind of comment...

@@ -79,6 +79,20 @@ if(BUILD_TESTING)

add_subdirectory(test/benchmark)

ament_add_gtest(test_ros2_actions test/test_actions.cpp TIMEOUT 180)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ros2 sounds redundant? how about test_actions?

Suggested change
ament_add_gtest(test_ros2_actions test/test_actions.cpp TIMEOUT 180)
ament_add_gtest(test_actions test/test_actions.cpp TIMEOUT 180)

Comment on lines 282 to 286
int main(int argc, char **argv)
{
::testing::InitGoogleTest(&argc, argv);
return RUN_ALL_TESTS();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This main function needs to stay here? I do not see any other tests have main function like this?

@jmachowinski
Copy link
Contributor

Hard no go from my side. This introduces a new bug, that you would not receive all feedbacks, before the result.
See
rclcpp_action.TestClientAgainstServer.async_send_goal_with_feedback_callback_wait_for_result
for more details.

@fujitatomoya
Copy link
Collaborator

even though https://design.ros2.org/articles/actions.html#clientserver-interaction-examples does not specify that feedback messages needs to be delivered before result, all examples tell me result comes to the client after feedback messages. besides, if the feedback message comes after the result response, it is strange behavior, i am not sure what application is supposed to process this message in the callback.

and i think, this action design and requirement is already broken with current implementation. because rmw uses different channels for feedback and result, the messages are not queued in order, that means there is always possibility that feedback messages would come after result response. (by changing the order to take the data, it can mitigate a bit but not a perfect solution...)

IMO, if we can make it better for user-experience, changing the order would be acceptable? maybe the order needs to be reconsidered well ? (goal response->cancel response->feedback->status->result response)

any thoughts? @mauropasse @jmachowinski @ahcorde @alsora

@mauropasse
Copy link
Collaborator Author

My thoughts are that ROS 2 actions' feedback and status messages are useful during the action's execution, as they allow the user to understand how the process is progressing.

However, the responses (goal/cancel/result) are the final pieces of information when an action has completed, and these should be considered the most important. Feedback and status messages received by the client after the action has finished could (should?) be ignored if the user already knows the action's outcome.

This is why I think responses should be prioritized over feedback. Moreover, a response is sent only once, whereas there can be a large number of feedback messages.

I also want to point out that this issue affects only the single-threaded executor. It does not impact the events executor, as events are processed in the order they are generated.

Also I'm unsure about judging the correctness for the new proposed priority of execution, based on the results of previous tests that fail now?

@fujitatomoya
Copy link
Collaborator

@mauropasse thanks for your comment!

Feedback and status messages received by the client after the action has finished could (should?) be ignored if the user already knows the action's outcome.

This is i am not sure yet by design. To be honest, i thought that is okay feedback and status messages would come result (or even cancel), and either ActionClient or application can ignore that. Let's wait for more feedback on this.

Also I'm unsure about judging the correctness for the new proposed priority of execution, based on the results of previous tests that fail now?

I believe that @jmachowinski just wanted to confirm this behavior just like me. I think that is totally fine to change the test once behavior is changed.

@jmachowinski
Copy link
Contributor

I checked the initial bug report, and must say, the test itself is highly flawed.

What is comes down to is you got a provider running with higher frequency than the consumer is processing.

From my point of view the bug report makes wrong assumptions as to how spin_some works / should work. To be fair, the documentation of the function is misleading, as you need real deep knowledge of the executor internals so know what 'Collect all work' really means. The obvious fix to the problem is to use spin_all.

As to the action code in general, as I stated before I think the design is highly flawed. But I don't see a 'simple' fix for the issue, like the one proposed here.

As to the importance of receiving the last feedback before the goal, I agree with @mauropasse that in a (our) real world application, one can normally ignore feedback and it's not important at all. The problem with this change though is, that it will break the tutorial
https://docs.ros.org/en/jazzy/Tutorials/Intermediate/Writing-an-Action-Server-Client/Cpp.html and possible break user code, that rely on this behavior.

Mauro Passerino added 2 commits September 18, 2024 07:38
Signed-off-by: Mauro Passerino <mpasserino@irobot.com>
Signed-off-by: Mauro Passerino <mpasserino@irobot.com>
@mauropasse
Copy link
Collaborator Author

In my last commits I addressed comments from this PR.
I ended up lowering only the Feedback priority, so:

  • Feedback > Status > Goal Response > Result Response > Cancel Response (original)
  • Status > Goal Response > Result Response > Cancel Response > Feedback (new priority)

In this way:

  • Test modifications on test_client.cpp for it to pass, are minimal.
  • The demo ros2/demos/action_tutorials/action_tutorials_cpp/src/fibonacci_* has same behavior as before.
  • There's still the risk of breaking user code.

Signed-off-by: Mauro Passerino <mpasserino@irobot.com>
Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, one comment.

@@ -565,7 +565,7 @@ TEST_F(TestClientAgainstServer, async_send_goal_with_feedback_callback_wait_for_

ASSERT_EQ(5u, wrapped_result.result->sequence.size());
EXPECT_EQ(3, wrapped_result.result->sequence.back());
EXPECT_EQ(5, feedback_count);
EXPECT_EQ(2, feedback_count);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected to be always 2 without any racy condition?

according to

result_service = server_node->create_service<ActionGoalResultService>(
result_service_name,
[this](
const ActionGoalResultRequest::SharedPtr request,
ActionGoalResultResponse::SharedPtr response)
{
, it could be more than 2 if feedback data is only available before other entities?

Suggested change
EXPECT_EQ(2, feedback_count);
EXPECT_GE(2, feedback_count);

@fujitatomoya
Copy link
Collaborator

@mauropasse can you also resolve @ahcorde 's comments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants