-
Notifications
You must be signed in to change notification settings - Fork 159
Async Workers #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async Workers #446
Conversation
Hi, these changes looks amazing! |
Just to make sure I'm understanding correctly, when you say "the current API" you mean the API that's currently on the If there are any potential changes to the API of this PR that can further help with your use case, I'm happy to iterate on it. As for roadmap / timeline, there will be a working group meeting in about two hours, and we'll probably be talking about next steps there. I think in general if anyone (including interested users, rather than maintainers) is able to leave reviews on #429 and #430 that may expedite the progress towards getting them merged. |
Thanks for your answer.
Yes, I was referencing the I might try to use this PR on my node to see if the API helps. For now, I was using the branch with the timers because I often use timers in ROS nodes. |
Signed-off-by: Michael X. Grey <greyxmike@gmail.com> Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
@esteve This PR should be properly ready for review now. To clean up the version history, I've squashed the original set of commits and rebased them onto I've also added a considerable amount of documentation throughout the modules, including usage examples and doctests. The root module now has a quick start guide with simple examples of basic usage, parameters, and workers (mostly lifted from the Timers have been moved into #480 . @romainreignier If you were relying on this branch for timers, you can retarget to the branch of #480. |
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
Signed-off-by: Michael X. Grey <mxgrey@intrinsic.ai>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I've given this a look and without becoming an async expert myself I feel as good as I can about the changes. I think its high time we move forward here and evolve rclrs.
/// | ||
/// Consider using [`Self::notify_on_service_ready`] if you want to wait | ||
/// until a service for this client is ready. | ||
pub fn service_is_ready(&self) -> Result<bool, RclrsError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this function was moved (and likely, outside the scope of this PR), but having a bool as a success type feels weird. Having two ways for a function to fail is a smell to me. What if we only returned a bool but we map all RclrsErrors we see into bools?
Glancing at the docs and the impl it seems like rclrs's design should prevent this from realistically happening unless something in the rmw layer returns something unexpected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, and more generally I've been disappointed by how many places the rclrs API tosses Results
at users. I think it will give users the impression that our API is more brittle than it really is.
What I'd like to propose is that for any function with the following criteria:
- There is no practical risk of an error happening that would be caused by user input
- There is a reasonable fallback behavior we can follow in the unlikely event of an error
- "Reasonable" may include panicking for situations that indicate an unrecoverable rmw or rcl error, e.g. failure to create a subscription/publisher
we provide two API functions, one that starts with try_
and one without. The one with try_
will return a Result
as our current API does, and the one without will use the fallback behavior in the event of an error.
In this case it would look like
pub fn service_is_ready(&self) -> bool {
match self.try_service_is_ready() {
Ok(ready) => return ready,
Err(err) => {
log_error!(
"rclrs.client.service_is_ready",
"An error occurred: {err}
);
return false;
}
};
}
pub fn try_service_is_ready(&self) -> Result<bool, RclrsError> { ... }
We can imagine doing likewise for create_subscription
/ try_create_subscription
and the other primitive creation methods.
For this PR I'll just introduce this concept for service_is_ready
, but I'll start putting together a proposal on this for the next working group meeting, and we can consider introducing this change in a follow-up PR.
// as the Node that started it is alive. | ||
pub(super) async fn node_graph_task( | ||
mut receiver: UnboundedReceiver<NodeGraphAction>, | ||
#[allow(unused)] guard_condition: GuardCondition, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for my understanding, why do this instead of _: GuardCondition
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly a matter of opinion, but I think it's better to explicitly mark a field as being something is meant to go unused to communicate very clearly that this is not a mistake or caused by a lazy refactor.
In some cases the reason it goes unused may be obvious, like if a method in a trait definition has some extra arguments you don't need, then it's obvious to a reviewer that you can't just remove the arguments as that would cause a compilation error. In those cases I don't mind using the leading underscore trick.
But this case that you're commenting on is more subtle. The code would compile if you removed the guard_condition
argument, but it would stop working correctly because the lifecycle of the guard_condition
argument has some side-effects. That's why I want to make sure readers receive an extremely clear signal that the argument is there on purpose despite it going unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do any of these lifecycle
/guard_condition
arguments leak to consumers of rclrs? Or are they all internal to our API?
/// If you need high-throughput multi-threaded execution, then consider using | ||
/// a different executor. | ||
// | ||
// TODO(@mxgrey): Implement a multi-threaded executor using tokio in a downstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you feel about different executors that required different deps being behind some cargo features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment that I left here is outdated. These days I am more inclined to do what you're suggesting: Put a tokio-based executor behind a feature flag. I think either would work, but the feature flags make some aspects of maintenance easier.
The most important thing is making sure that thirdparty executors can be made outside of the rclrs crate, which I believe they currently can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'm also interested in writing a real-time executor at some point, so glad to see the foresight here :)
#[allow(unused)] | ||
node: Node, | ||
lifecycle: WaitableLifecycle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm getting lost in the sauce on this large PR, but why is this still unused?
Maybe a missing comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these lifecycle fields do nothing but carry a side-effect to remove this primitive from its waitset when the user drops all references to the primitive.
The docs for WaitableLifecycle
mention this, but I can add a comment directly onto each of these fields to reinforce it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have comments in all of the other call-sites describing it
https://github.com/ros2-rust/ros2_rust/pull/446/files/8f898601365c7ac8166393a9a25d35130144e240#diff-1d9414c53c73d7dac3b591eea05198c58c6099cdc60255148c77332c1dc8e6f8R29
Could be worth the maintenance cost considering how subtle it really is.
@@ -95,10 +95,10 @@ where | |||
/// Creates a new `Publisher`. | |||
/// | |||
/// Node and namespace changes are always applied _before_ topic remapping. | |||
pub(crate) fn new<'a>( | |||
node_handle: Arc<NodeHandle>, | |||
pub(crate) fn create<'a>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why the name change from new to create? Also, since this can fail, should we add a try_*
prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very opinionated thing on my part, but I think many Rust users will be accustomed to both:
fn new
being the public-facing method for creating an instance of an objectfn new
usually returns aSelf
, not usually aArc<Self>
much less aResult<Arc<Self>, _>
I don't have any strong objection to using the name new
here, but the above two points were my rationale for changing it.
I wouldn't mind adding a try_
prefix. Coincidentally that would go well with this comment that I wrote before reading your suggestion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to leave the function name as create, actually. I like your rationale — it's similar to what we do at work with our "rusty C++" :D
A quick search shows that Tokio does this in a few places too:
https://docs.rs/tokio/latest/tokio/?search=create
Though they're not entirely consistent:
https://docs.rs/tokio/latest/tokio/io/unix/struct.AsyncFd.html#method.try_new
If there are no additional comments from the other maintainers this time next week I will merge this PR |
Hmmm I forgot about the new windows CI. We are currently failing that now. I'm on vacation for the week as well. Sorry about that folks. |
Weirdly, the repo checkout is being done in a different dir than before... It used to be D:/ and now it's C:/ ... I'll research it and do an PR to fix this. It's weird though. Also we can consider to remove the execution of this action on every PR and push and only let it check it every week. I think that should be sufficient. |
This PR is the final culmination of all the async execution features that I currently have plans to implement. This builds on top of #421 so there is no need to review this PR until that one is merged, but I'm opening this PR to help make it visible to anyone who wants to try out the full breadth of async features that I've been working on.
The key role of this PR is to introduce the worker concept that was initially proposed in these slides. The API of this implementation has some differences from the initial proposal, but the concept is exactly the same.
Since this is just a draft preview PR, in lieu of a detailed explanation, I'll direct folks to this slide deck which explains the motivation behind the Worker concept.