Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for intra-process actions #144

Merged

Conversation

mauropasse
Copy link
Collaborator

@mauropasse mauropasse commented Mar 7, 2024

Fixes for intra-process actions:

  • Don't throw in a normal situation of expired IPC Action client
  • Check if data in buffers before extract
  • Fix IPC Actions QoS depth
  • Fix data race on IPC Actions callbacks
  • Correct intra-process actions is_ready()
  • Use Goal ID as key to store client callbacks and responses from server
  • Use the FNV-1a hash algorithm for Goal UUID
  • Remove RCL_RET_CLIENT_TAKE_FAILED error log - add comment instead

The following (simplified) flowchart represents what's happening when the Action Client sends a goal request go the Action Server, until the server accepts and responds to the client:

auto goal_handle_future = action_client->async_send_goal(goal_msg, send_goal_options);
auto goal_handle = goal_handle_future.get();

image

The process is almost exactly the same for:

auto result_future = action_client->async_get_result(goal_handle);
auto wrapped_result = result_future.get();

and for cancel:

    auto cancel_result_future = action_client->async_cancel_goal(goal_handle);
    auto cancel_result = cancel_result_future.get();

In the following chart I show part of action_client->async_get_result but focusing on the server logic, which sends the result only if the client has requested for it:
image

}

void store_ipc_action_feedback(FeedbackSharedPtr feedback)
{
feedback_buffer_->add(std::move(feedback));
gc_.trigger();
is_feedback_ready_ = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why these are removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the flag here was breaking the SingleThreadedExecutor, which already sets the is_*_ready_ flags on the is_ready() API.
For EventsExecutor, the flags are set on the take_data_by_entity_id() API.

@mauropasse
Copy link
Collaborator Author

mauropasse commented Mar 11, 2024

Some comments about how things work, I'll try to make it simple and not very long. (see new flowchart in comment below)

Every time the Client requests something to the Server, a callback is created: it will be called when the server sends back the response (using the response as argument).
After the callback is called, must be removed (in some cases), since they maintain in scope shared pointers.

We have 5 types of Action Client events (EventType), all of them with their ResponseCallback callback:

  • FeedbackSubscription -> Callback created on client constructor - no need to remove after call
  • StatusSubscription -> Callback created on client constructor - no need to remove after call.

For every individual goal sent we have:

  • GoalClient -> Callback created when async_send_goal - must be removed after call
  • ResultClient -> Callback created when async_get_result - must be removed after call
  • CancelClient -> Callback created when async_cancel_goal - must be removed after call

Since a client can make multiple requests, we need a storage for all the individual goal IDs, event types, their callbacks, and the unread_count.

I created a structure to hold all the info and data to process the different events, mapped with their respective "Goal ID". So we have in the map:

< Goal_ID, {EventType, ResponseCallback, unread_count} >

Besides this map, we have all the IPC ring-buffers to hold the responses from the server. They look like:

<Goal_ID, ServerResponse>

So when we extract an element (response) form the ring buffer, we get the Goal_ID and use it as key to extract the ResponseCallback and call it (and then remove it, if necessary).

@bpwilcox
Copy link

bpwilcox commented Mar 13, 2024

For a simple interaction client/server, there are usually 4 threads involved. That's an extra complication that required attention avoiding races

Could you elaborate which threads are involved and what each are doing for a client/server interaction?

@mauropasse
Copy link
Collaborator Author

Could you elaborate which threads are involved and what each are doing for a client/server interaction?

The threads involved are:

  1. Client app thread sending goal requests
  2. Client executor thread (sets the "on_ready_callback" and executes action client work)
  3. Server executor thread (puts responses into the client's ring buffers)
  4. Server thread (usually created to execute the goal, also puts responses into the client's ring buffers)

ipm->remove_action_server(ipc_action_server_id_);
}

protected:
// Intra-process version of execute_goal_request_received_
// Missing: Deep comparison of functionality betwen IPC on/off
void
ipc_execute_goal_request_received(GoalRequestDataPairSharedPtr data)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this ipc function I still see calls to rcl such as rcl_action_get_zero_initialized_goal_info(). What is the goal_info? How is this relevant or necessary to call into rcl when going through ipc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the rcl_action_goal_info_t goal_info is just used to obtain the rcl_goal_handle, to then update the goal state.
The "bookkeeping" of the goal state is still performed in rcl.

"intra_process_action_send_cancel_response called "
" after destruction of intra process manager");
}
auto ipm = lock_intra_process_manager();

// Convert c++ message to C message
rcl_action_cancel_request_t cancel_request = rcl_action_get_zero_initialized_cancel_request();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question to my other comment, but why do we need to get cancel_request from rcl layer while doing ipc? I know this PR is built on top of previous work but I am missing the rationale for the interaction with the rcl layer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the communication (send request / responses / etc) goes through intra-process.
But the rest of the logic still lives in rcl, that is, we still use the rcl_handle which controls the goal state, etc.

In summary, all the rcl_action_send_* have their parallel intra_process_action_send_* versions. But the rest of the code is common.

@mauropasse mauropasse merged commit 9177002 into irobot-ros:irobot/humble Apr 9, 2024
@mauropasse mauropasse deleted the mauro/fix-ipc-actions-squashed branch April 9, 2024 14:56
apojomovsky pushed a commit to apojomovsky/rclcpp that referenced this pull request Jun 20, 2024
* Fixes for intra-process Actions

* Fixes for Clang builds

* Fix deadlock

* Server to store results until client requests them

* Fix feedback/result data race

See ros2#2451

* Add missing mutex

* Check return value of intra_process_action_send

---------

Co-authored-by: Mauro Passerino <mpasserino@irobot.com>
apojomovsky pushed a commit to apojomovsky/rclcpp that referenced this pull request Jun 21, 2024
* Fixes for intra-process Actions

* Fixes for Clang builds

* Fix deadlock

* Server to store results until client requests them

* Fix feedback/result data race

See ros2#2451

* Add missing mutex

* Check return value of intra_process_action_send

---------

Co-authored-by: Mauro Passerino <mpasserino@irobot.com>
apojomovsky pushed a commit to apojomovsky/rclcpp that referenced this pull request Aug 12, 2024
* Fixes for intra-process Actions

* Fixes for Clang builds

* Fix deadlock

* Server to store results until client requests them

* Fix feedback/result data race

See ros2#2451

* Add missing mutex

* Check return value of intra_process_action_send

---------

Co-authored-by: Mauro Passerino <mpasserino@irobot.com>
apojomovsky pushed a commit that referenced this pull request Aug 13, 2024
* Fixes for intra-process actions (#144)

* Fixes for intra-process Actions

* Fixes for Clang builds

* Fix deadlock

* Server to store results until client requests them

* Fix feedback/result data race

See ros2#2451

* Add missing mutex

* Check return value of intra_process_action_send

---------

Co-authored-by: Mauro Passerino <mpasserino@irobot.com>

* Fix IPC Actions data race (#147)

* Check if goal was sent through IPC before send responses
* Add intra_process_action_server_is_available API to intra-process Client


---------

Co-authored-by: Mauro Passerino <mpasserino@irobot.com>

* Fix data race in Actions: Part 2 (#148)

* Fix data race in Actions: Part 2

* Fix warning - copy elision

---------

Co-authored-by: Mauro Passerino <mpasserino@irobot.com>

* fix: Fixed race condition in action server between is_ready and take"… (ros2#2531)

* fix: Fixed race condition in action server between is_ready and take" (ros2#2495)

Some background information: is_ready, take_data and execute data
may be called from different threads in any order. The code in the old
state expected them to be called in series, without interruption.
This lead to multiple race conditions, as the state of the pimpl objects
was altered by the three functions in a non thread safe way.

Co-authored-by: William Woodall <william@osrfoundation.org>
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

* fix: added workaround for call to double calls to take_data

This adds a workaround for a known bug in the executor in iron.

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

---------

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Co-authored-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Co-authored-by: William Woodall <william@osrfoundation.org>

---------

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Co-authored-by: Mauro Passerino <mpasserino@irobot.com>
Co-authored-by: jmachowinski <jmachowinski@users.noreply.github.com>
Co-authored-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Co-authored-by: William Woodall <william@osrfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants