Async Workers #446

mxgrey · 2024-12-12T08:04:59Z

This PR is the final culmination of all the async execution features that I currently have plans to implement. This builds on top of #421 so there is no need to review this PR until that one is merged, but I'm opening this PR to help make it visible to anyone who wants to try out the full breadth of async features that I've been working on.

The key role of this PR is to introduce the worker concept that was initially proposed in these slides. The API of this implementation has some differences from the initial proposal, but the concept is exactly the same.

Since this is just a draft preview PR, in lieu of a detailed explanation, I'll direct folks to this slide deck which explains the motivation behind the Worker concept.

rclrs/src/executor.rs

romainreignier · 2025-03-10T11:19:27Z

Hi, these changes looks amazing!
I am struggling to use the current API in a real world node in which I need to wait for several topics to trigger other actions like async client calls or timer timeout. It is very different than how I used to do it in C++.
From the slides, it seems that the Workers would help me organize better the Rust code.
What is the current roadmap to get Workers merged?

mxgrey · 2025-03-10T11:47:43Z

I am struggling to use the current API

Just to make sure I'm understanding correctly, when you say "the current API" you mean the API that's currently on the main branch, right?

If there are any potential changes to the API of this PR that can further help with your use case, I'm happy to iterate on it.

As for roadmap / timeline, there will be a working group meeting in about two hours, and we'll probably be talking about next steps there.

I think in general if anyone (including interested users, rather than maintainers) is able to leave reviews on #429 and #430 that may expedite the progress towards getting them merged.

romainreignier · 2025-03-10T12:20:48Z

Thanks for your answer.

you mean the API that's currently on the main branch, right?

Yes, I was referencing the main branch.

I might try to use this PR on my node to see if the API helps. For now, I was using the branch with the timers because I often use timers in ROS nodes.

Signed-off-by: Michael X. Grey <greyxmike@gmail.com> Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

mxgrey · 2025-05-14T15:22:28Z

@esteve This PR should be properly ready for review now.

To clean up the version history, I've squashed the original set of commits and rebased them onto main.

I've also added a considerable amount of documentation throughout the modules, including usage examples and doctests. The root module now has a quick start guide with simple examples of basic usage, parameters, and workers (mostly lifted from the examples directory).

Timers have been moved into #480 . @romainreignier If you were relying on this branch for timers, you can retarget to the branch of #480.

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

maspe36

Alright, I've given this a look and without becoming an async expert myself I feel as good as I can about the changes. I think its high time we move forward here and evolve rclrs.

maspe36 · 2025-05-23T15:59:28Z

rclrs/src/client.rs

+    ///
+    /// Consider using [`Self::notify_on_service_ready`] if you want to wait
+    /// until a service for this client is ready.
+    pub fn service_is_ready(&self) -> Result<bool, RclrsError> {


I see that this function was moved (and likely, outside the scope of this PR), but having a bool as a success type feels weird. Having two ways for a function to fail is a smell to me. What if we only returned a bool but we map all RclrsErrors we see into bools?

Glancing at the docs and the impl it seems like rclrs's design should prevent this from realistically happening unless something in the rmw layer returns something unexpected

I agree, and more generally I've been disappointed by how many places the rclrs API tosses Results at users. I think it will give users the impression that our API is more brittle than it really is.

What I'd like to propose is that for any function with the following criteria:

There is no practical risk of an error happening that would be caused by user input

There is a reasonable fallback behavior we can follow in the unlikely event of an error

"Reasonable" may include panicking for situations that indicate an unrecoverable rmw or rcl error, e.g. failure to create a subscription/publisher

we provide two API functions, one that starts with try_ and one without. The one with try_ will return a Result as our current API does, and the one without will use the fallback behavior in the event of an error.

In this case it would look like

pub fn service_is_ready(&self) -> bool { match self.try_service_is_ready() { Ok(ready) => return ready, Err(err) => { log_error!( "rclrs.client.service_is_ready", "An error occurred: {err} ); return false; } }; } pub fn try_service_is_ready(&self) -> Result<bool, RclrsError> { ... }

We can imagine doing likewise for create_subscription / try_create_subscription and the other primitive creation methods.

For this PR I'll just introduce this concept for service_is_ready, but I'll start putting together a proposal on this for the next working group meeting, and we can consider introducing this change in a follow-up PR.

maspe36 · 2025-05-23T16:29:18Z

rclrs/src/node/node_graph_task.rs

+// as the Node that started it is alive.
+pub(super) async fn node_graph_task(
+    mut receiver: UnboundedReceiver<NodeGraphAction>,
+    #[allow(unused)] guard_condition: GuardCondition,


for my understanding, why do this instead of _: GuardCondition?

This is mostly a matter of opinion, but I think it's better to explicitly mark a field as being something is meant to go unused to communicate very clearly that this is not a mistake or caused by a lazy refactor.

In some cases the reason it goes unused may be obvious, like if a method in a trait definition has some extra arguments you don't need, then it's obvious to a reviewer that you can't just remove the arguments as that would cause a compilation error. In those cases I don't mind using the leading underscore trick.

But this case that you're commenting on is more subtle. The code would compile if you removed the guard_condition argument, but it would stop working correctly because the lifecycle of the guard_condition argument has some side-effects. That's why I want to make sure readers receive an extremely clear signal that the argument is there on purpose despite it going unused.

Do any of these lifecycle/guard_condition arguments leak to consumers of rclrs? Or are they all internal to our API?

maspe36 · 2025-05-23T17:04:59Z

rclrs/src/executor/basic_executor.rs

+/// If you need high-throughput multi-threaded execution, then consider using
+/// a different executor.
+//
+// TODO(@mxgrey): Implement a multi-threaded executor using tokio in a downstream


How would you feel about different executors that required different deps being behind some cargo features?

The comment that I left here is outdated. These days I am more inclined to do what you're suggesting: Put a tokio-based executor behind a feature flag. I think either would work, but the feature flags make some aspects of maintenance easier.

The most important thing is making sure that thirdparty executors can be made outside of the rclrs crate, which I believe they currently can.

Sounds good. I'm also interested in writing a real-time executor at some point, so glad to see the foresight here :)

maspe36 · 2025-05-23T17:06:55Z

rclrs/src/client.rs

+    board: Arc<Mutex<ClientRequestBoard<T>>>,
    #[allow(unused)]
-    node: Node,
+    lifecycle: WaitableLifecycle,


Maybe I'm getting lost in the sauce on this large PR, but why is this still unused?

Maybe a missing comment?

All of these lifecycle fields do nothing but carry a side-effect to remove this primitive from its waitset when the user drops all references to the primitive.

The docs for WaitableLifecycle mention this, but I can add a comment directly onto each of these fields to reinforce it.

We have comments in all of the other call-sites describing it
https://github.com/ros2-rust/ros2_rust/pull/446/files/8f898601365c7ac8166393a9a25d35130144e240#diff-1d9414c53c73d7dac3b591eea05198c58c6099cdc60255148c77332c1dc8e6f8R29

Could be worth the maintenance cost considering how subtle it really is.

maspe36 · 2025-05-23T17:12:00Z

rclrs/src/publisher.rs

    /// Node and namespace changes are always applied _before_ topic remapping.
-    pub(crate) fn new<'a>(
-        node_handle: Arc<NodeHandle>,
+    pub(crate) fn create<'a>(


nit: Why the name change from new to create? Also, since this can fail, should we add a try_* prefix?

This is a very opinionated thing on my part, but I think many Rust users will be accustomed to both:

fn new being the public-facing method for creating an instance of an object

fn new usually returns a Self, not usually a Arc<Self> much less a Result<Arc<Self>, _>

I don't have any strong objection to using the name new here, but the above two points were my rationale for changing it.

I wouldn't mind adding a try_ prefix. Coincidentally that would go well with this comment that I wrote before reading your suggestion here.

I'm happy to leave the function name as create, actually. I like your rationale — it's similar to what we do at work with our "rusty C++" :D

A quick search shows that Tokio does this in a few places too:
https://docs.rs/tokio/latest/tokio/?search=create

Though they're not entirely consistent:
https://docs.rs/tokio/latest/tokio/io/unix/struct.AsyncFd.html#method.try_new

maspe36 · 2025-05-31T15:23:56Z

If there are no additional comments from the other maintainers this time next week I will merge this PR

maspe36 · 2025-06-08T02:17:37Z

Hmmm I forgot about the new windows CI. We are currently failing that now. I'm on vacation for the week as well. Sorry about that folks.

mxgrey · 2025-06-09T06:37:25Z

I'm not sure how the current failure in the Windows CI would be caused by this PR 🤔

knmcguire · 2025-06-09T14:20:01Z

Weirdly, the repo checkout is being done in a different dir than before... It used to be D:/ and now it's C:/ ...

I'll research it and do an PR to fix this. It's weird though.

Also we can consider to remove the execution of this action on every PR and push and only let it check it every week. I think that should be sufficient.

mxgrey mentioned this pull request Dec 16, 2024

Execution structure (spin-off of #427) #428

Merged

maspe36 reviewed Dec 17, 2024

View reviewed changes

rclrs/src/executor.rs Show resolved Hide resolved

mxgrey mentioned this pull request Feb 13, 2025

Async Execution #421

Closed

3 tasks

mxgrey mentioned this pull request Mar 4, 2025

Possible error in pubsub example: Subscriber class looses ref to node #462

Closed

mxgrey mentioned this pull request May 8, 2025

Creating subscriber after spin doesn't work #475

Open

mxgrey and others added 15 commits May 12, 2025 12:02

Squash async worker changes

837f248

Signed-off-by: Michael X. Grey <greyxmike@gmail.com> Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

Fix bug in logging macros

97c341d

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

Clean up examples using new API

49f5065

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

Ensure workers wake up when a task is run

5f6015a

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

Iterating on documentation

c8f61f0

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

Update docs for subscriptions

a27bb89

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

Fix create_worker bug

402a9c9

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

Add more documentation for Worker

1d8c468

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

Fix all todos that can be addressed for now

f0c2e09

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

Add introductory documentation and a parameter demo

4617c9f

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

Add documentation for parameters

1489807

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

Fix formatting

1f4a8b9

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

Remove residual directory

ac57416

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

Port over packaging fix

7eba15d

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

Fix formatting for examples

1986ebb

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

mxgrey force-pushed the worker branch from 073c25e to 1986ebb Compare May 14, 2025 15:16

mxgrey marked this pull request as ready for review May 14, 2025 15:17

mxgrey mentioned this pull request May 14, 2025

feat: timers API #480

Merged

mxgrey added 3 commits May 14, 2025 15:36

Fix distro divergence in guid type

f1ad6d8

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

Fix compilation bug in 1.75

fd2079b

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

Force backtrace version that is compatible with 1.75

b9d65b1

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

mxgrey added 2 commits May 14, 2025 16:26

fix merge

3c256be

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

fix merge conflicts

8f89860

Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>

mxgrey mentioned this pull request May 15, 2025

[WIP] Action support #410

Closed

maspe36 approved these changes May 23, 2025

View reviewed changes

maspe36 requested review from esteve and jhdcs May 31, 2025 15:24

maspe36 merged commit d62583d into ros2-rust:main Jun 8, 2025
4 checks passed

ros2-rust-release-plz bot mentioned this pull request Aug 15, 2025

chore: release v0.5.0 #511

Merged

azerupi mentioned this pull request Aug 22, 2025

Spin once with timeout of zero causes segfault #518

Closed

Async Workers #446

Async Workers #446

Uh oh!

Conversation

mxgrey commented Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

romainreignier commented Mar 10, 2025

Uh oh!

mxgrey commented Mar 10, 2025

Uh oh!

romainreignier commented Mar 10, 2025

Uh oh!

mxgrey commented May 14, 2025

Uh oh!

maspe36 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maspe36 commented May 31, 2025

Uh oh!

Uh oh!

maspe36 commented Jun 8, 2025

Uh oh!

mxgrey commented Jun 9, 2025

Uh oh!

knmcguire commented Jun 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mxgrey commented Dec 12, 2024 •

edited

Loading